THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 


Volume XXXII January, 1941 Number 1 


MONTH AND SEASON OF BIRTH IN RELATION 
TO INTELLIGENCE, INTROVERSION-EXTRAVERSION, 
AND INFERIORITY FEELINGS 


GEORGE FORLANO 
r Teachers College, Columbia University 
AND 
VIRGINIA ZERILLI EHRLICH 
New York City 


The problem of the relationship between the month of birth and 
physiological and psychological development of the human organism 
has been a fascinating and important one to many investigators. 
Sanders’? has reviewed studies concerning the relationship between 
birth month and physiological condition of the child. The major part 
of the evidence that he reviewed indicates that children born in Sum- 
mer or in Autumn are heavier than those born in the other seasons. 

A review of the studies on the psychological factors in relation to 
birth month reveals a unanimity in the general conclusion drawn; 
namely, that children born in the Winter months (December—March) 
are as a group, duller than those born in the other seasons. Blonsky! 
found the lowest mean IQ for children born in Winter. Pintner* in 
1931 studied forty-nine hundred twenty-five cases and found the lowest 
IQ among those born in Winter. Pintner and Forlano‘in 1933 grouped 
seventeen thousand five hundred two cases according to socio-economic 
status and found the lowest mean IQ to occur in the Winter months 
(January-March) regardless of whether the children came from low, 
average, or high socio-economic status. Their findings definitely 
showed that month of birth is a factor in the intelligence test scores of 
children. Later, in 1937, Pintner and Maller* studied the possible vari- 
ations that might be caused by combining different ethnic groups. 
Their sample of cases consisted of school children belonging to three 


ethnic groups; namely, Italian, Jewish, and Negro. The results of the 
1 


2 The Journal of Educational Psychology 


Pintner-Maller study indicated that those born in the warm months 
(June-November) are on the average brighter than those born in the 
cold months (December-May). This general difference prevailed in 
each of the three ethnic groups. More recently, Fialkin and Beckman? 
studied the test scores of thirty-one hundred eighty-nine adults. The 
test scores were obtained from the records of clients of the Adjustment 
Service of New York City. The results obtained from this study of 
the scores of adults, in general, agree with those obtained with children. 

In 1939 Pintner and Forlano® raised the question as to whether 
the same tendency found in the Northern Hemisphere; namely, the 
slightly lower average IQ of children born in Winter, would be found 
also in the Winter months on the Southern Hemisphere where the 
seasons are reversed. Their data from the Southern Hemisphere tend 
slightly to confirm previous findings in the Northern Hemisphere. 

The study to be reported here seeks: First, the relation between 
month of birth and intelligence scores of adolescent males and, second, 
the relation between month of birth and certain aspects of personality ; 
namely, introversion-extraversion and inferiority feeling. 


DESCRIPTION OF THE POPULATION 


The total number of cases employed in this report was seventy- 
eight hundred ninety-seven.* All cases were male students of a large 
public college in New York City. The test records were selected at 
random from the college files for entering students for the years from 
1929 through 1936 inclusive. 


i THE TESTS 


All matriculated students at entrance to college are given the 
Thurstone Psychological Examination, an intelligence test, the J-X 
test, and Feradcom test. The latter two were devised or adapted by 
Payne. 

The one referred to as the J-X test affords presumably a measure 
of introversion-extraversion. The greater the score the more extra- 


* Some students missed taking one or both of the personality tests so that for 
the /-X test the cases numbered seventy-eight hundred twenty-seven, whereas the 
number for the Feradcom test was seventy-eight hundred thirty-seven. 

t We wish to thank Professor Arthur F. Payne, formerly Director of the 
Student Personnel Bureau of the College of the City of New York, for his permission 
to make use of the files. The J-X test is an adaptation of L. R. Marston’s Per- 
sonality Rating Scale. 


| 
af 
i 
at 
3 
} 
x 
4 
¥ 
if 
4 
% 


Month and Season of Birth 3 


verted the person is supposed to be. Simple statements from this 
test are as follows: 


a. I indulge in day dreams and reveries, thinking of what I 

b. I seldom like to argue and force people to accept my ideas.. True False 

c. I never break in on the conversation of people sitting near me. True False 


The reliability of the J-X test by the split-half technique was found to 
be .78 by the Spearman-Brown prophecy formula. It must be remem- 
bered that the terms introversion-extraversion are used in many dif- 
ferent meanings. Various tests so named may show low or even 
negative correlations among themselves. In this study the terms will 
always be used in quotation marks to indicate only the kind of intro- 
version-extraversion measured by the J-X test. 

The Payne Feradcom test affords a measure of feeling of inade- 
quacy. The greater the score on this scale the greater the tendency 
toward inferiority feeling. The name Feradcom is an abbreviation for 
the Adlerian inferiority complex. The split-half reliability for this 
test is about .76. Some items from this test are as follows: 


a. I am much troubled by doubts about God and immortality.. True False 
b. I often feel that I am set apart and different from other 


c. I welcome responsibilities of all sorts............. True False 


TABULATION OF DATA 


A distribution of test scores for each test and for each month was 
made. In addition, the months of March, June, September and 
December were subdivided into two groups, one for the first twenty 
days, and the other for the rest of the month. This made a total of 
sixteen distributions for each test. The division in the case of the 
four months just mentioned made it possible later to combine the 
months into seasonal groups of either trimonthly units or according to 
astronomical delimitations. According to the grouping by trimonthly 
units, the Winter season would extend from the first of January to the 
last day in March and thus make it possible to present our data in a 
form comparable with those of previous investigators. The grouping 
by astronomical seasons, e.g., Winter, December 21st through March 
20th, seemed a more accurate procedure to follow. 


DISTRIBUTION OF INTELLIGENCE SCORES BY MONTH OF BIRTH 


Table I shows a summary of the means and standard deviations of 
the intelligence scores by month of birth. In agreement with the 


4 The Journal of Educational Psychology 


findings of other investigators, the means for the Winter months tend 
to be lower than those for Summer months. The means for the months 
of November, December, January and February range from 209.17 to 
211.64. The means for the other months excluding September range 
from 212.06 to 215.47. September is an exception, but its mean is 
approximately the same as that of February, which has the highest 
mean of the Winter months. 


TaBLe I.—MEAN AND STANDARD DeEviaTION oF INTELLIGENCE, J-X, AND FERAD- 
com Scores By Monts or Birta 


Intelligence scores I-X scores Feradcom scores 
Month Num- Stand-| Num- Stand-| Num- Stand- 
ber of ard | ber of ard | ber of ard 
stu- Mean devi- | stu- Mean devi- | stu- Mean devi- 
dents ation | dents ation | dents ation 
January....... 689 |209.17) 47.99} 656 | 63.22) 11. 658 | 40.85) 12.45 
February...... 619 |211.64) 50.88) 602 | 63.16) 10.80) 595 | 39.82) 11.85 
March........ 668 [213.29] 49.50) 657 | 62.58) 11.25) 660 40.60) 12.05 
630 |215.47| 50.37; 631 | 63.45) 11.25; 633 | 40.25) 12.40 
692 |213.21| 46.46; 689 | 60.75) 11.75) 694 | 40.85) 11.95 
als 721 |214.30) 48.40) 717 | 62.77) 11.10} 719 | 40.28) 12.90 
627 |212.06) 48.40} 623 | 62.87 11.40 626 | 41.14) 12.85 
August........ 641 213.50 50.28 640 | 61.89} 11.15| 641 | 41.04] 12.45 
September....| 725 |211.50 49.40 734 | 63.08] 11.35) 727 | 41.20] 12.40 
October....... 651 |214.33) 48.90) 648 | 63.57) 11.35) 650 | 40.63) 12.40 
November....| 594 |210.40) 47.75) 592 | 63.12) 11.40) 595 | 40.86) 12.60 
December... .. 640 |210.50) 49.60) 638 63. 50| 10.85} 639 | 39.76) 13.00 


How statistically significant are these monthly differences? The 
standard ratios, that is, the ratio of the mean difference to the standard 
error of the difference, are presented in Table III. The month with 


~ the highest intelligence score is April with a mean of 215.47. January 


has the lowest mean score of 209.17. The standard error of the dif- 
ference between these two monthly means is 2.71. The standard ratio 
for this mean difference is 2.32. In other words, the chances are 
989.9 in 1000 that a similar mean difference would be found if this. 
investigation were to be repeated under similar conditions. The dif- 
ferences among the other months would probably be less than the one 
between January and April. The results are in line with those of pre- 


° 
4 
4 
| 
| 
At 
| 
4 
af 
| 


Month and Season of Birth 5 


vious investigators who found that the lowest mean IQ occurred in the 
Winter months. 


AND STANDARD DEVIATION OF INTELLIGENCE, /-X, AND FERAD- 
com Scores By SEASON OF BirRTH 


Intelligence scores J-X scores Feradcom scores 


Season * Num- Stand-| Num- Stand-| Num- Stand- 
ber of ard | ber of ard | ber of ard 

stu- “evi- | stu- devi- | stu- devi- 
dents ation | dents ation | dents ation 


Autumn: October 1-—Decem- 


1885 |211.80|) 48.80) 1878 | 63. 11.20} 1884 | 40.36) 12.60 
Winter: January 1—March 31/ 1976 |211.80) 48.84; 1915 | 62.98) 11.25) 1913 | 40.48) 12.15 
Cold: October 1-March 31....| 3861 |211.80) 48.80) 3793 | 63.20) 11.25) 3797 | 40.40) 12.40 
Spring: April 1—-June 30....... 2043 (214.30) 48.40) 2037 | 62.86] 11.25) 2046 | 40.46) 12.40 
Summer: July 1-September 30| 1993 |212.30) 49.40) 1997 | 62.36) 11.30) 1994 | 41.14) 12.55 
Warm: April 1-September 30..| 4036 |213.30) 48.90) 4034 | 62.75) 11.25) 4040 | 40 8o| 12.50 


Autumn: September 21-—De- 


1881 |213.00] 48.80) 1884 | 63.40) 11.30) 1882 | 40.46) 12.47 
Winter: December 21—March 

1973 |209.90) 49.50) 1913 63.00) 11.15) 1910 | 40.45) 12.48 
Cold: September 21—March 20) 3854 |211.40) 49.20) 3797 | 63.20) 11.25) 3792 | 40.45) 12.45 
Spring: March 21—June 20....| 2045 (214.40) 48.50) 2040 | 62.95) 11.30) 2048 40.38] 12.29 
Summer: June 21—September 20; 1998 |212.50 49 .20| 1990 | 62.47) 11.25) 1997 | 41.15) 12.64 
Warm: March 21—September 

4043 (213.50) 48.80) 4030 | 62.75) 11.25) 4045 | 40.75) 12.45 


* We have grouped the seasons ii two ways. The grouping in the lower part of the table is 
according to astronomical delimitations; whereas, the grouping in the upper half of the table is not, 
but follows the grouping used in previous studies. 

When we group by trimonthly units, for example, the Winter season 
consisting of the months of January, February and March, we find that 
Autumn and Winter have lower mean intelligence scores than either 
Spring orSummer. The results are presentedin Table II. Thegreat- 
est mean difference is between Spring and Autumn or Winter. The 
ratio of the mean difference between Spring and Winter to the sigma 
difference is 1.63, which is not a statistically significant ratio. The 
results are presented in Table III. A similar result can be expected 
for a comparison between Spring and Autumn. 

Since there was a possibility that the method of grouping the 
months by seasons in trimonthly units might affect the trend of results, 
a second analysis was made according to astronomical limits. The 
seasons were taken as beginning and ending with the dates of the vernal 
and autumnal equinoxes and the Summer and Winter solstices. This 


ry 
| 


& The Journal of Educational Psychology 


regrouping caused interesting changes in the mean intelligence scores. 
These are given in Table II. By defining Autumn as beginning on 
September 21st and ending on December 20th, the warmer days of 
September were included and the colder days of December excluded, 
the mean scores rose 1.20 points. By defining Winter as beginning on 
December 21st and ending on March 20th, the warmer part of March 
was eliminated and the colder days of December included. The mean 
for Winter then fell by 1.90 points. The standard ratio for the mean 
difference between Spring and Winter is 2.90, which is practically 
statistically significant. The results are given in Table III. 


TaBLE III.—-CoMPARISON BETWEEN HIGHEST AND Lowest MEAN INTELLIGENCE 


ScorEs 
Stand-| Mean; o 
Month or season Num- Mean | ard de- | differ-| differ- be. 
ber aiff. 
viation} ence | ence 
630 |215.47| 50.37 
689 |209.17| 47.99 6.90) 3.71 | 3.38 
Season: April 1-June 30 (Spring)...| 2043 |214.30, 48.40 250 | 1.53 | 1.63 
January 1—March 31 (Winter). 1976 |211.80, 48.84 ‘ 
Season: March-June 20 (Spring). ..| 2045 |214.40) 48.50 4.50 | 1.55 | 2.90 
December 21—March 20 (Winter) | 1973 |209.90) 49.50 


*In the text this ratio is referred to as the standard ratio in order to differen- 
tiate it from the critical ratio of McGaughy and the experimental coefficient of 


McCail. 


By grouping the colder seasons and the two warmer seasons by 


either method, we find that the mean intelligence score is greater for 
the warmer seasons in both cases, (Table II); but neither mean dif- 
ference yields a statistically significant ratio. The standard ratios are 
given in Table IV. Pintner and Forlano‘ using more than twice the 
number of cases employed here found a similar tendency which yielded 
a ratio of 3.57, which is considered statistically significant. 

At this point it is worth while to note that the regrouping of seasons 
according to vernal and autumnal equinoxes and Summer and Winter 
solstices tended to increase the standard ratios in the comparisons 
involving Winter and Spring and the cold and warm months. These 
results were presented in Tables III and IV. 

In what season can we expect the highest mean intelligence score 
and in what season can we expect the lowest? Table V summarizes 
all the results on this point to date. A perusal of Table V shows that 


— 
+ 
a 
am 
A 
pi 
i 
‘ 
| | 
; 
i 
i 
4 
f 
“ae 
4 
4 
; 2 
4 


Month and Season of Birth 7 


there is no agreement as to which season gives the highest intelligence 
rating. However, five of the six studies reviewed here show the mean 
for Spring equaling or surpassing the means for the other seasons. 
Every study shows the lowest mean intelligence score occurring in the 


TaBLeE IV.—CoMPARISON BETWEEN MEANS OF WARM AND CoLp MONTHS ON 
INTELLIGENCE SCORES 


Stand- | Mean | Sigma! ,. 
Season Num- Mean | ard de- | differ- | differ- Dift. 
ber Caitt. 
viation | ence | ence 
Cold: October 1—March 31......... 3861 |211.80) 48.80 1.50 | 1.10 | 1.36 
Warm: April 1—-September 30...... 4036 |213.30) 48.90 
Cold: September 21—March 20..... 3854 |211.40) 49.20 2 10|1.10| 1.91 
Warm: March 21—September 20....| 4043 |213.50) 48.80 | 


Winter months. The means for Summer and Autumn are very much 
alike, and in some cases practically identical. The seasonal trimonthly 
groupings of the Forlano-Ehrlich study give identical means for 
Autumn and Winter. However, using a more accurate and standard 
grouping; namely, that according to astronomical delimitations, the 
mean for Winter is noticeably decreased while that for Autumn is 


TaBLe V.—SEASONAL MEAN IQ’s or INTELLIGENCE SCORES 


Pintner:| | Fialkin- {Forlano*|Forlanot 
Pintner, Beck- | Ehrlich, | Ehrlich, 

back- Forlano,| colored 

Subjects school man, male male 

ward | children | | and | | college | college 

children children | white 
adults | students} students 

children 
Number..... 453 4,925 | 17,502 | 6,353 | 3,189 7,897 | 7,897 
Spring.......| 84.3 97.20 | 102.35 | 94.9 6.69 | 214.30 | 214.40 
Summer..... 81.5 | 97.20 | 102.06 | 96.4 6.66 | 212.30 | 212.50 
Autumn..... 81.3 97.10 | 101.83 | 96.5 6.58 | 211.80 | 213.00 
Winter....... 80.1 95.95 | 100.65 | 94.5 6.53 | 211.80 | 209.90 


* According to trimonthly groupings. 
+ According to astronomical groupings. 


substantially increased. Using the astronomical grouping for seasons, 
we may conclude that the highest mean intelligence score will most 


probably occur in Spring while the lowest mean IQ will most probably 
occur in Winter. 


| 
{ 


8 The Journal of Educational Psychology 


MONTH OF BIRTH AND PERSONALITY 


Investigation of this phase of the problem is exploratory and the 
conclusions that may be drawn are provisional until reports of other 
investigators are received and examined as to consistency of findings. 

The greater the score on the J-X test indicates a greater tendency 
toward extraversion. The analysis of the data on the I-X test is 
given in Table I. The results show slight variations among the mean 
scores from month to month. There seems to be a general tendency 
for the warm months from May through August to have lower mean 
scores than the other months. In other words, persons born during 
these warm months tend to be slightly more introverted than those 
born during other months. The greatest mean difference is between 
May with a mean of 60.75 and October with a mean of 63.57. The 
standard ratio for this difference is 4.48 which is statistically significant. 


This is given in Table VI. 
TaBLE VI.—CoMPARISON BETWEEN HIGHEST AND Lowest MEAN ON THE /]-X | 
ScaLE 
Stand-| Mean; o Diff 
Month or season Mean | ard de- | differ- | differ- 
viation| ence | ence | 
Month: October................. 648 | 63.57 | 11.35 
689 | 60.75 | 11.75 | 27°82 | | 4.48 
Season: October 1—December 31 
1878 | 63.40 | 11.20 
July 1-September 30 (Summer).| 1997 | 62.36 | 11.30 1.08 -36 | 2.88 
Season: September 21—December 20} 
CO rea 1884 | 63.40 | 11.30 93 36 | 2.58 
June 21-September 20 (Summer) |} 1990 | 62.47 | 11.25 


Grouping the months by seasons either in astronomical or tri- 
monthly units, we find that the tendency remains the same. The 
higher mean J-X scores, presented in Table II occur in the cold seasons 
of Autumn and Winter. The lower mean /-X scores occur during the 
warmer seasons of Spring andSummer. ‘The greatest differences occur 
between Summer and Autumn; the standard ratios, given in Table VI, 
being 2.88 for the tri-monthly grouping and 2.58 for the astronomical 


grouping. 
We do not obtain as marked trends when the seasons are combined 


into the larger units of cold and warm months, but the general tendency 


is the same. The standard ratios are given in Table VII. 


i 
- 
bist 
a 
| 
3 
“at 
“we 
9 
| 
ay 
‘ 
4 
: - 


Month and Season of Birth 


9 


Turning now to an analysis of the results on the Feradcom test we 
find that the mean differences from month to month are very small. 
It will be remembered that the greater the score on the Feradcom test 
the greater the tendency toward feelings of inferiority. Table I shows 


TaBLE VII.—CoMPARISON BETWEEN MEANS OF WARM AND CoLp MONTHS ON THE 


I-X 
‘ Num- 
Stand-| Mean; 
Season ber of | Mean | ard de- | differ-| differ- — 
stu- sats pitt. 
viation | ence | ence 
dents 
Cold: October 1—March 31....... 3793 | 63.20 | 11.25 45 95 | 1.80 
Warm: April 1—September 30..... 4034 | 62.75 | 11.25 : . : 
Cold» September 21—March 20....| 3797 | 63.20 | 11.25 45 25 | 1.80 
Warm: March 21-September 20...| 4030 | 62.75 | 11.25 | 


the mean Feradcom score by months. The range of means is from 
39.76 for December to 41.20 for September, a difference of 1.44 points. 
The standard error of this difference is .69 and the standard ratio 2.09. 
The latter ratio is given in Table VIII. 
Table II presents the Feradcom results according to seasonal ' 

groupings. When analyzed according to trimonthly units, the greatest 

TaBLeE VIII.—CompaRIsON BETWEEN HiGHEsT AND LowEsT MEAN ON THE 
SCALE 


Muse: Stand-|Mean/| Diff 
Month or season Mean | ard de- | differ-| differ- | —— 
ber as Cpiff 
viation| ence | ence 
Month: September.............. 727 | 41.20; 12.40 
639 | 39.76, 13.00 | | 2-09 
Season: July 1-September 30 (Sum- 
October 1-December 31 
Season: June 21-September 20 
1997 | 41.15) 12.64 
March 21-June 20 (Spring).....| 2048 | 40.38) 12.29] | -39| 1-97 


difference is that between Summer and Autumn. 


The mean difference 


is .78 and the standard ratio given in Table VIII, is 1.95. When 
analyzed according to astronomical seasonal groupings the highest 
mean Feradcom score given in Table II occurs again in Summer with 
a mean of 41.15. The lowest mean Feradcom score occurs in Spring 


j 
- 


10 The Journal of Educational Psychology 


with a mean of 40.38. The standard ratio of this mean difference is 
1.97. 

Table IX shows a comparison between the half-year groupings of 
cold and warm months. The comparison shows small mean differences 
of .40 and .30 with standard ratios of 1.43 and 1.07, respectively. 

None of the standard ratios reported for the Feradcom test are 
large enough to be considered statistically significant. The signifi- 
cance of any of the results lies only in their consistency. There seems 
to be a trend for persons born during the Summer months to have 
feelings of inferiority. 

TaBLE [X.—CoMPARISON BETWEEN MEANS OF WARM AND MONTHS ON THE 
Frerapcom SCALE 


N Stand- | Mean c Diff 
Season Mean | ard de-| differ-| differ-| 
ber aiff. 
viation | ence | ence 
Cold: October 1—-March 31....... 3797 | 40.40 | 12.40 40 og | 1.43 
Warm: April 1—-September 30..... 4040 | 40.80 | 12.50 
Cold: September 21—March 20....| 3792 | 40.45 | 12.45 30 og | 1.07 
Warm: March 21—September 20...| 4045 | 40.75 | 12.45 


We have seen that students who were born in warm months tend on 
the average to have scores indicating introversion and average scores 
indicating inferiority feelings. Moreover, persons born during the 
warm months were found to have a higher mean intelligence score than 
those born during the cold months. It will be noted that introversion, 
higher mean intelligence score, and greater inferiority feelings appear 
during the warmer months. Perhaps at this point it might be well to 
ask: Are the more intelligent more introverted, and have they more 
inferiority feelings? In other words, is intelligence correlated with 
introversion and inferiority feelings? 

To answer the question regarding the correlation between intelli- 
gence and our two personality traits, we employed four sets of data. 
We selected at random two hundred eighty-eight cases from each of the 
following freshman years: September, 1930; September, 1931; Sep- 
tember, 1933; and February, 1937. For each year three coefficients of 
correlation were computed; namely, that between Thurstone Psy- 
chological and /-X, Thurstone Psychological and Feradcom, and J-X 
and Feradcom. The correlations are presented in Table X. 

The correlations indicate that there is practically no relation 
between intelligence and each of our two personality traits. Persons 


| 
| | 
| | 
| 
| 


Month and Season of Birth 11 


with high intelligence scores may make high or low scores on the 
personality tests. We can confidently say, for example, that the warm 
months gave a lower introversion average score than the colder months 
not because introversion goes with higher average intelligence score. 
The correlations between J-X and Feradcom are greater than zero; | 
they range from —.27 to —.39. The more introverted a person is 
the more likely it is that he has a greater feeling of inferiority. How- 
ever, the correlations between J-X and Feradcom are comparatively 
low and indicate that whatever the tests are measuring, they are not 
measuring the same thing. Yet, we find that both introversion and 
inferiority feeling tend to go with persons born in the warmer months 
of the year. 


TABLE X.—COEFFICIENTS OF CORRELATION BETWEEN THE VARIABLES, THURSTONE 
PsYCHOLOGICAL, INTROVERSION-EXTROVERSION, AND FERADCOM 


Thurstone psycho- | Thurstone psycho- ; 

Number logical and J-X__ logical and Feradcom Feradoom and 
288 — .07 + .04 +.11 + .04 — .39 + .03 
288 .00 + .04 .00 + .04 —.27 + .04 
288 — .02 + .04 .04 + .04 — .33 + .03 
288 +.10 + .04 —.12 + .04 — .38 + .03 

SUMMARY 


The present study based on the scores of seven thousand eight 
hundred ninety-seven adolescents makes possible a comparison of the 
results with those based on children and adults. The results of the 
present study on the relation of intelligence and month or season of 
birth in general conform with those obtained with children and adults. 
Adolescents born in the Spring months (March 21—June 20) were on 
the average brighter than those born in Winter (December 21—March 
20). The difference, 4.50, is statistically significant—2.90 times its 
standard error. However, the comparison of warm months (March 
21—September 20) and cold months (September 21—March 20) gave a 
mean difference of 2.10, which is not statistically significant—1.91 
times its standard error. Considered individually the differences 
observed are not entirely conclusive, but the differences are never in 
favor of a cold group whether compared against a corresponding warm 
month, season or half-year. 

The results of the present study on the relation between month 
of birth and introversion-extraversion or inferiority feeling show that 


12 The Journal of Educational Psychology 


persons born in the cold seasons of Autumn and Winter seem to be 
more extraverted. The greatest difference occurred between Summer 
and Autumn, the standard ratios being 2.88 and 2.58, respectively, 
according to the method of grouping months that was used. We do 
not obtain as marked trends when the seasons are combined into cold 
and warm months. The results for the test of inferiority are similar 
to those for the test of introversion-extraversion except that those for 
the former are less conclusive than the results for the latter. Yet, the 
tendency is always in the same direction; namely, that persons born 
in the Summer or warm months tend as a group to be introverted and 
exhibit feelings of inferiority. 

When we review the results for all three measures; namely, intelli- 
gence, introversion-extraversion, and inferiority feeling, we discern a 
consistent trend. We note that persons born during the warm months 
tend to have higher intelligence scores, and that as a group they seem 
to be introverted and to exhibit feelings of inferiority. The conclusion 
in so far as the intelligence factor is concerned is clearer since it rests on 
a broader investigational and factual basis. The major value of our 
results on the intelligence factor lies in their consistency with each other 
and with the findings of other investigators. On the other hand, the 
results for the personality factor must not be emphasized too much. 
Needless to point out, the inconclusive standard ratios, the complexity 
and number of aspects of personality, the intricate connections between 
personality and intelligence, the comparative fallibility of personality 
tests, make it imperative to mark time until other investigators report 
before drawing any conclusion on the relation between month of birth 
and personality. 

REFERENCES 


1. Blonsky, P. P.: “Frith- und Spatjahrkinder.” Jahrb. f. Kinderheilkunde, 
Vol. cxxrv, 1929, pp. 115-120. 

2. Fialkin, H. N. and Beckman, R. O.: “‘The Influence of Month of Birth on the 
Intelligence Test Scores of Adults.” J. Genet. Psychol., Vol. uu, 1938, pp. 
203-209. 

3. Pintner, R.: “Intelligence and Month of Birth.” J. Appl. Psychol., Vol. xv, 
1931, pp. 149-154. 

4. Pintner, R. and Forlano, G.: ‘The Influence of Month of Birth on Intelligence 
Quotients.” J. Educ. Psychol., Vol. xx1v, 1933, pp. 561-584. 

5. Pintner, R. and Forlano, G.: ‘Season of Birth and Intelligence.” J. Genet. 
Psychol., Vol. trv, 1939, pp. 353-358. 

6. Pintner, R. and Maller, J. B.: “‘ Month of Birth and Average Intelligence among 
Different Ethnic Groups.” J. Genet. Psychol., Vol. 1, 1937, pp. 91-107. 

7. Sanders, B. S.: Environment and Growth. Baltimore, Md.: Warwick and 
York, 1934, pp. 375. 


— 
4 
on 
+ 
; 
+, 
} 
“aa 
} 
| 
of 
4 
| 
ait 
~ 
43 
4 
a 
we 
: 


THE RELATIONSHIP BETWEEN REASONS STUDENTS 
GIVE FOR TAKING CERTAIN COURSES AND 
STUDENT ESTIMATES OF THESE COURSES 


EDITH B. MALLORY, MARGARET HUGGINS AND 
BERNICE STEINBERG 


Wellesley College 


PROBLEM 


The curriculum of the modern college is a somewhat variable 
quantity. In its slow but continuous readaptation, one of the current 
trends is toward a decrease in the amount of required work, and an 
increase in the number of courses open for election. When institu- 
tional restrictions are reduced, what factors influence the students’ 
choice of courses? Selection may be determine by a personal interest 
in the subject, or by the desire to work under a particular professor. 
Advice of parents or of friends may be heeded. Some courses may be 
taken for still other reasons. Even though there may often be unreal- 
ized or unadmitted motives, it has appeared worth while to discover 
to what degree these various factors operate, in the opinion of the 
students themselves. 

It would seem important to learn also whether students estimate 
the courses which they take for one reason more favorably than those 
taken for another. Do groups of courses, if classified according to the 
bases of their selection, tend to vary in respect to their value to the 
student? Do they differ in regard to interest of subject-matter, 
interest of the class period, or pleasure afforded? Does the student 
work harder for some of these than for others? Is there any difference 
in the average grades? What are the students’ own impressions 
regarding the actual courses in which they have enrolled? 

Fundamental to all these questions lies the problem of the under- 
graduate attitude toward required versus elective work. While the 
actual subjects required in different colleges must differ so widely that 
it would be dangerous to generalize on the basis of one study, a detailed 
comparison of the students’ estimates of\required and elective courses 
should throw some light on the relative effectiveness of these two 
groups, as judged by the students themselves. 

This investigation is reported in the belief that even approximate 
answers to the foregoing inquiries will shed some light on an obscure 
field. If so, it may be of some use both to those who plan general 

13 


\ 


14 The Journal of Educational Psychology 


curricular requirements, and to those who help direct the program of 


the individual student. 


PROCEDURE 


Since the information sought was almost entirely subjective in 
nature, t.e., the reasons which students believed led them to select 
their courses, and their estimates of the courses which they had taken, 
the questionnaire method was used. 

The subjects were three hundred students of a four-year liberal arts 
college for women. Of these, seventy were seniors, one hundred sixteen 
juniors, sixty-two sophomores, and fifty-two freshmen. It was 
thought desirable to have a relatively large number of juniors, since 
they, of all classes in this particular institution, have most freedom in 
their elective schedule. The group included students of all levels of 
achievement, from those on probation lists to Phi Beta Kappa mem- 
bers, and represented a cross-section of the enrollment in a large 
number of departments. 

The questionnaire required each student to consider specifically 
the courses which she had taken the previous semester. She was asked 
to indicate the reason for enrolling in each of these, and was also 
requested to rank her courses in six different respects. This ranking 
showed the relative standing of the courses according to (a) subjec- 
tively estimated value to the student, (6) interest of class period, (c) 
interest of subject-matter, (d) enjoyment derived, (e) amount of work 
done for the courses, and (f) grades received. 

These questionnaires were given individually to each of the subjects 
and were filled out in the presence of student-examiners. In order to 
insure honest and spontaneous replies it was arranged, and each subject 
was told, that no member of the faculty would know who filled out any 
given questionnaire blank. A further aid to free expression was the 
representation of each course by symbol, rather than by name, so that 
it could not be identified by anyone except the writer. The impor- 
tance of accuracy and frankness was stressed by the student-examiners, 
who reported that they believed the subjects had coéperated fully. 

Thirteen questionnaires were eliminated when they were found 
incorrect in the respect to grades. The remaining questionnaires 
provided one thousand five hundred sixty-four individual reports upon 
specific courses, and it is upon these that the following results are 


based. 


- 
| pre 
q 
} 
a) 
| 
| 
z 


\ 
Reasons Students Give and Student Estimates of Courses 15 


RESULTS 
A. Reasons for Choice of Courses 


It was found that the influences credited with determining the 
choice of courses fell into five categories, (a) Farent’s Influence, (6) 
Friend’s Influence, (c) Preference for Professor, (d) Student’s Own 
Interests, and (e) Unknown, Unadmitted or Unclassified. The per- 
centage of elective courses attributed to each of these appears in Table 
| 


TaBLeE I.—PERCENTAGE OF ELECTIVE CouRSES CHOSEN ON Eacu Basis 


Fresh- | Sopho- 
men, more, 
per cent | per cent 


Junior, | Senior, All, 
per cent | per cent | per cent 


Parent’s influence.............. 11 9 7 3 7 
Friends’ influence.............. 11 9 9 7 9 
Preference for professor......... 7 15 15 22 15 
Student’s own interests......... 62 63 63 63 63 
Unknown (or unclassified)....... 9 4 6 5 6 
All elective courses............. 100 | 100 | 100 —~———s«<21:00 100 


By far the most frequent reason for choosing a course is the indi- 
vidual’s own interests connected with the subject-matter or field. 
This is said to explain the selection of nearly two-thirds of all the 
courses, and is strikingly consistent from year to year. Only a few 
admit reliance on parent’s influence, which wanes steadily from 
eleven per cent in the freshman year to three per cent the senior year. 
Friends’ influence, also eleven per cent for freshmen, is loweréd more 
slowly to seven per cent senior year. There is also a decrease in the 
courses elected for ‘‘unknown”’ or unclassified reasons, from nine per 
cent to five per cent during the same interval. On the other hand, an 
increasing number of courses are chosen because of the professors 
giving them. It is natural that this figure should be low, (seven 
per cent), freshman year, since only a few entrants would know the 
faculty personnel. This proportion is at least doubled fer sophomores 
and juniors, and more than a fifth of all senior courses are taken for 
this reason. 


| 


16 The Journal of Educational Psychology 


B. Relative Estimates of Courses Chosen for Various Reasons 


Since each student designated the rank order of her courses accord- 
ing to (a) value, (b) interest in class period, (c) interest in subject- 
matter, (d) enjoyment, (e) amount of work and (f) grades received, 
it was possible to compare courses of the various ‘‘reason-for-choice”’ 
categories, in all these respects. 

Courses which were rated either first or second were counted 
“superior,” and those ranked last or next to last were regarded 
‘‘inferior.”” The number of superior courses was then divided by the 
sum of the superior and inferior ones, to derive a “superiority-index”’ 
for each category under consideration. It will be seen that an index 
is thus the per cent (of each group of courses) which was designated as 
superior. These indices, with their probable errors,' are presented in 
Table II. 

The probable errors of the differences between indices, shown at the 
bottom of the Table II, refer to the indices for all classes. They are 
based on the largest PE of all the indices, in each group (7.e., .035 for 
‘“‘Parent’s Influence”’and “ Friends’ Influence,” .024 for “‘ Preference 
for Professor,” 0.12 for “‘Student’s Own Interest” and .034 for the 
“Unknown” category)—a treatment which simplifies presentation, 
and errs only in the direction of conservative interpretation. The 
difference between two indices may be regarded as significant if it is 
at least four times the PE of Diff. recorded for the groups involved. 
Thus, when compared for value, courses chosen because of “‘ preference 
for professor” have a significantly higher index, .73, than that of 
courses taken because of “‘ Friends’ influence,” .48, since the difference 
between them, .25, is six times the PE of the difference between these 
groups, .041. On the other hand when courses chosen because of 
‘‘Parent’s influence” are considered by themselves, it is found that 
the index for “value,”’ .42, is not significantly higher than the index for 
‘interest in class,”’ .33, since the difference, .09, is only 1.8 times .05, 
the probable error of the difference between any two indices within the 
‘“‘Parent’s influence”’ group. 

It should be noted that when separate class groups are considered, 
rather than all four classes together, the smaller numbers afford less 
reliable statistical data. For this reason, analyses of temporal trends 


1These PE’s are derived from the tables of Edgerton, Harold, and Paterson, 
D. G.: “Table of Standard Errors and Probable Errors of Percentages for Varying 
Numbers of Cases.” J. Appl. Psychol., Vol. x, pp. 378-391. 


| 
& 
¢ 
4 
4 
ii 
f 
4 


Reasons Students Give and Student Estimates of Courses 17 


TaBLeE oF CoursEs SELECTED FOR VARIOUS REASONS 


Interest 
N | Value Enjoy- Interest in sub- | Work | Grades 
ment | in class 
ject 
Parent's influence. 
21 | .65 71 .60 .62 .68 .41 
22 .44 .38 .36 .50 .50 .57 
bd 39 | .34 .24 .20 .27 .387 .44 
91 | .42 .36 .33 .38 .49 .43 
.035 | .034 .033 .034 .035 .035 
Friends’ influence. 
ced 21 | .44 .32 .35 .43 . 67 .30 
22 | .25 .23 .23 .89 . 57 .33 
50 | .42 .50 .59 .52 .46 .40 
be 21 | .72 .60 .54 .59 . 57 . 26 
114 | .48 .46 .50 . 50 .52 .35 
Preference for professor. 
13 | .91 {1.00 .90 .82 77 .79 
et 37 | .77 77 .72 . 60 . 60 
83 | .76 . 83 .80 . 83 .68 .60 
6254 65 | .74 .63 .70 .59 .62 47 
198 | .73 .78 .76 .72 .65 . 56 
.022 | .020 .020 .022 .023 .024 
Student's own interests. 
112 | .76 .78 .73 .76 .78 .54 
+ 158 | .58 . 67 .58 .60 .57 . 67 
Juniors. .... 646066 348 .56 .57 .59 . 56 . 55 .55 
188 | .57 .57 .55 .55 . 56 .52 
806 | .59 .61 .58 .59 57 .55 
ott ata bd O11 | .O11 .012 
Unknown or unclassified 
se 16 | .22 .31 .15 .33 .54 
10 | .13 .20 .29 .18 .50 .24 
400000 33 | .27 .21 .21 .13 .23 
15 | .19 .25 .14 15 .36 .73 
74 | .22 | .24 .23 .20 | .26N .52 
... | .0382 | .034 .033 .031 .034 .039 
Maximal PE’s? of the Differences between the Indices in the Groups 
Parent's | Friends’ Other (or 
influence | influence ‘ unknown) 
professor | interest 
Parent’s influence.................. .05 
Friends’ influence.................. .05 .05 
Preference for professor............. .041 .041 
Student's own interest.............. .037 .037 .022 .016 
Other (or unknown)................ .053 .053 .045 .039 .053 


1 The superiority index is a percentage, found by dividing the sum of the inferior and superior 
Thus a .50 index shows that the number of courses 


courses by the number of courses rated superior. 


rated superior was exactly equal to the number rated inferior; an index above .50 indicates that more 
than half were superior; an index below .50 shows that a majority were inferior. 
? The PE Diff.’s are derived from the largest PE of the indices in every group. 


5 


18 The Journal of Educational Psychology 


based on differences between figures for the several classes must 
be rather speculative and are offered only tentatively as possible 
juterpretations. 

It will be seen that freshman indices all tend to be higher than those 
of other classes. This is primarily due to the fact that so many of the 
“inferior”? rankings were given to required courses, of which freshmen 
carry appreciably more than sophomores, while juniors and seniors 
ordinarily carry none. This must be kept in mind when comparisons 
between classes are made, but it does not alter the relative standing of 
the several “‘reason’’-groupings, when freshman courses are considered 
either by themselves, or in combination with those of the other classes. 

If all the types of reasons acknowledged, “ preference for professor” 
seems to lead to the courses which are most rewarding. While the 
excellence of this category is especially impressive as regards “interest 
in class period,” its superiority does not stop there. This group is 
highest also in “‘value”’ as well as in “enjoyment,” and it surpasses, 
for ‘‘interest in subject-matter,’’ even those courses which were sup- 
posedly chosen because of the students avowed interest in the partic- 
ular fields. This superiority is found not only when all the students 
are considered together, but remains fairly consistent for all classes. 
One striking feature here is the relatively large amount of work which 
is reported to be expended on these courses, even though grades for 
them tend, on the whole, to be lowered between the first year and the 
last. 

Second on the list stand those courses which a student selects 
according to the dictates of her own interests. The superiority of 
these is moderate, but they are consistently and significantly above 
average in all respects. Probably the higher freshmen figures would 
really be equivalent with the others, if we could make allowance for 
the effect of freshman required courses, noted earlier. 

Courses chosen because of the recommendations of friends receive 
third place for all the subjective estimates given. When all four classes 
are taken together, the chances are equal for inferior or superior ratings 
of these courses. Except for grades, which tend to be low, none of the 
final indices of this group differ significantly from .50. It will be 
observed, however, that there is an appreciable change from year to 
year. Friends’ recommendations appear to improve markedly, prob- 
ably because the “friends” are increasingly likely to be students who 
have themselves taken the courses in question. Elections in this group 
seem to be poor for freshmen and sophomores, while juniors report 


¥ 
— 

rt 
| 

| 


Reasons Students Give and Student Estimates of Courses 19 


them superior as often as inferior, and seniors tend to rate them high. 
The low index for senior ‘‘grades”’ suggests that students do not 
necessarily encourage others in the selection of courses that are easy. 

Courses taken because of parents’ influence prove significantly 
below average in all respects, except for amount of work and for grades, 
which are close to average. Not only is parents’ advice less frequently 
relied upon as time goes on—as appears in an earlier section—but it 
seems to grow less hélpful. Courses said to be selected on this basis are 
progressively inferior in all subjective ratings, and require of seniors 
the greatest amount of work while they yield them their lowest grades. 

It is probably only fair, however, to note that when a parent 
recommends several courses, the ones which also fit in with the “‘stu- 
dents’ interests” will probably be assigned to the latter category, so 
that the parent gets tredit (or blame) for advocating only those courses 
which do not have any great intrinsic appeal for the student, and so 
are likely to prove unsatisfactory. The writers think it would be 
unfortunate if these figures were interpreted to mean that parents 
should keep hands off entirely! 

Courses reported as chosen for ‘‘unknown”’ reasons are doubtless a 
miscellany. One is tempted to suspect that the classification has been 
employed for the ones in which a student has been disappointed, and 
so led to wonder, ‘‘Why in the world did I ever take that course!’’ 
However, it is certainly interesting to note that elections not attrib- 
utable to the four other types of reasons are likely to be unfortunate. 
The actual indices—all twenty-four of which are consistently low with 
one single exception—would seem to point to their being inferior. 
The one exception, however, is itself significant; it is a relatively very 
high index for senior marks. It takes no clairvoyance to suspect here 
the presence of courses selected because the student anticipated that 
they would yield the maximum grades for the minimum effort. 


C. A Comparison of Required and Elective Courses 


Required and elective courses, carried by freshmen and sophomores, 
may be compared by reference to Table III. Freshmen unquestion- 
ably tend to enjoy elective courses far more than required ones, think 
them more interesting in subject-matter, and get better grades in them. 
Elective courses have higher indices for value and amount of work 
also, but these differences are probably not significant, since the critical 
ratio of the difference over PE Diff. is in each case less than 4. Both 
types of course are rated about the same for interest in class period. 


20 The Journal of Educational Psychology 


Sophomores find elective courses significantly superior to required 
ones, when ranking for “value,” ‘‘enjoyment,”’ “interest in class,” 
and “interest of subject.”” They are likely to get lower “‘grades”’ in 
their elective courses than in required ones, even though they do more 
“work” for them. This is probably because in elective courses they 
must more often compete with upper-classmen. Unfortunately this 
study offers no basis for determining whether the required subjects 
rank so poorly because they really have no appeal for the students, or 
because the student resents the fact that they are required. 


TaBLeE III.—CoMPARISON OF REQUIRED AND ELECTIVE COURSES FOR FRESHMEN 
AND SOPHOMORES 
Freshmen Courses 


Required Elective 
(N = 97) | (N = 183) | nie | PE | cr 


Diff. 
Index| PE | Index| PE 
.44 | .034 .53 | .025 .09 | .042 | 2.14 
Enjoyment................. .23 | .029| .60| .024/ .387 | .0388 | 9.72 
Interest in class............. .49 | .034 .51 | .025 | .02 | .042 48 
Interest in subject........... .387 | .035 .56 | .024} .19 | .042 | 4.50 
.41 | .034 .55 | .024] .14| .042 | 3.32 
4 .383 | .032 .62 | .024 | .29 |) .040| 7.25 
Sophomore Courses 
Required Elective 
N = 106 N = 249 .@ | (PE 
Diff. Diff. CR 
Index} PE | Index| PE 
.25 | .028 .64 | .020 | .39 | .034 | 11.49 
.23 | .028 .64 | .020 | .41 | .034 | 12.05 
Interest in class............. .29 | .080 |; .65 .020| .36| .037 | 9.75 
Interest in subject........... .26 | .029 .64 | .020 | .38 | .035 | 10.85 
wh .380 | .030| .020/ .36| .037 | 9.75 
.70 | .030 .32 | .020 | .38 | .036 | 10.55 


D. Evaluation of the Findings 


The writers would be the first to admit that subjective reports such 
as those upon which this study is based, are not necessarily a precise 
record of fact. The revelation of ‘‘snap’’ courses taken by seniors 


‘ 


Reasons Students Give and Student Estimates of Courses 21 


who claimed that they “didn’t know” why they took them seems clear 
evidence of at least one instance of this defect. Another suspicious 
aspect is the high intercorrelation of all the subjective estimates—.e., 
“value,” “enjoyment,” “interest in class” and in “subject””—sin 
this probably indicates a high halo effect as well as real concomitance 
of trends in these values. Another point of importance is the fact 
that a student’s estimate—even when a true report of her subjective 
attitude—cannot be assumed to reflect an absolute value, since a 
course may perhaps possess some basic excellence which a student 
does not immediately perceive. It must be noted further that the 
estimates are all relative within each student’s experience. No dis- 
tinction can be made between the reports of a student who found her 
courses all fairly interesting, and another student who found them all 
rather dull. 

In spite of all these considerations, however, the writers are con- 
vinced that student opinion has a real significance. It is believed that 
a study of this sort, in which the reports are probably very frank, and 
directed not merely toward general lines of interest but toward specific, 
recent courses, may be of practical help to insight in counseling on the 
part of advisors and deans. 

In the writers’ opinion, three major points emerge with especial 
significance. First of these is the importance of effective teachers— 
individuals who not only know their subjects but who also interest 
and stimulate the students to higher levels of academic enthusiasm. 
Second is the evidence of the soundness of a student’s own interests 
as motivating forces. Third is the striking contrast between estimates 
of elective and required courses, The reported inferiority of the latter 
is great enough to raise forcibly the question whether subtler methods 
of curriculum control should not be sought. 


SUMMARY AND CONCLUSIONS 


(1) The students who have been in college longer are more likely 

to take courses because of ‘‘ preference for work with a particular pro- 
fessor.’”’ During all four college years, students tend to find greatest 
satisfaction and interest in courses selected on this basis, and they do 
relatively the greatest amount of work for these courses. 
_ (2) Throughout their college course, students choose about two- 
thirds of their subjects because of what they regard as their “own 
interests.”” Courses so selected are a little more likely to be rated 
superior than inferior, in all respects. 


a 


22 The Journal of Educational Psychology 


(3) “‘Parent’s influence”’ determines the choice of relatively few 
courses. These are rated low, on the average, especially toward 
senior year. | 

(4) Courses chosen because of ‘“‘friends’ advice” are neither high 
nor low on the average, but are more rewarding for upperclassmen than 


for underclassmen. | 
(5) Courses selected for reasons which the students cannot or will 


not identify usually prove very unsatisfactory, and this is true even 
of those in which high grades may be received. 

(6) ‘Elective courses” tend to be ranked higher than ‘required 
courses,’ whether the student estimates them in regard to value, 
enjoyment, interest in class period or interest in subject-matter. Stu- 
dents are likely to work harder for elective courses than for required 
ones, even though this difference is not reflected in their grades. 


| 
a 
. 


CONFIDENCE AND ACHIEVEMENT IN EIGHT 
BRANCHES OF KNOWLEDGE* 


DONALD M. JOHNSON 
Fort Hays Kansas State College 


Two factors which one might consider important in determining an 
individual’s confidence in a decision or statement are: (1) His knowl- 
edge or achievement in the specific field of discussion, and (2) his 
general level of confidence, 7.e. a personality trait. The present inves- 
tigation is concerned with the importance of these two variables in 
determining the confidence which an individual expresses in his 
judgments. 

Some of the questions involved in the relations between achieve- 
ment and confidence are of educational significance. Does high 
achievement in a given field go with high confidence in that field? 
Or, to turn the question around: Have the confident people anything 
to be confident of? And some related questions are of theoretical 
significance in connection with the old problem of the consistency of 
the individual or the generality of personality traits. Will the person 
who is confident in one field be confident in others? And, if generality 
of confidence is found, is it dependent upon generality of achievement? 
Another question concerns general relations rather than individual 
differences. Will any given subject be more confident in those fields 
in which his achievement is higher? 

Some of these questions have been attacked in an incidental way. 
But an adequate answer to any of these questions requires tests in 
several fields of knowledge which will yield reliable scores for both 
achievement and confidence. Achievement tests of satisfactory reli- 
ability are easy to find but most of the investigations of confidence 
have ignored the troublesome problem of test reliability. In a pre- 
vious investigation by the writer (8) a simple method of reporting 
confidence was found to give reliable scores for mean confidence. 
This method, with some modification, has been followed, therefore, in 
the present investigation. 

The Michigan Vocabulary Profile Test, prepared under the direc- 
tion of E. B. Greene (4), appeared to fill our requirements for a battery 


* Portions of this paper were read to the Kansas Academy of Science at Wichita, 
March 29, 1940, and portions were read to the Midwestern Psychological Associa- 
tion at Chicago, May 3, 1940. 

23 


vu) 


24 The Journal of Educational Psychology 


of achievement tests.* This battery gives scores for vocabulary in 
eight relatively independent fields of information. The names of the 
tests (see Table I) indicate the content, with the exception of Test 1, 
which deals with ‘‘mental and social processes and situations.’”’ Each 
test consists of thirty four-choice items arranged in ten levels of diffi- 
culty. Form Am was used. An example from Test 5 follows. 


Tendon at the back of the heel: 
(1) Cordiform. (2) Deltoid. (3) Patellar. (4) Achilles. 


This battery has several advantages for our purposes. The tests 
are fairly independent and cover a wide range of knowledge. Correla- 
tions among the tests range from .00 to .50, with a median of .27, 
according to the author’s report (5). Tests which are more inde- 
pendent could no doubt be found but tests which show various degrees 
of relationship will probably yield the most information about the 
relation between achievement and confidence. Repeat reliabilities 
are quoted between .78 and .94 (see Table II). The difficulty of the 
items is such that there will be wide variations in the amount of con- 
fidence with which the responses are given. This seems to be a 
requirement for a test of individual differences in confidence. 

The subjects were students in the writer’s classes in general psychol- 
ogy at Fort Hays Kansas State College. Psychology is a required 
course at Fort Hays; this group is fairly representative of the student 
body. Over a hundred students were tested but, since the tests were 
given on three days, only ninety-six records are complete. The data 
reported are (unless exceptions are noted) for ninety-six subjects, 
thirty-three men and sixty-three women, of. which about ninety were 
freshmen. 

The testing procedure was as follows: The subjects indicated their 
answers on separate answer sheets. An additional answer sheet, of a 
different color, was given to each subject for the report of confidence. 
On these check sheets were one hundred large numbers, corresponding 
to the numbers of the test items. Beside each of these large numbers 
were small numbers, 1, 2, 3, 4, 5, for the report of confidence. After 
the usual test instructions had been given, the following instructions 
in respect to confidence were given: 


After indicating your answer on the answer sheet, indicate your confidence 
in this answer on the sheet marked ‘‘Confidence.”” Number 5 represents 


* The writer is indebted to Professor H. B. Reed for procuring the tests and for 
general facilitation of the study. 


} 
‘ 
4 
} 
> 
4 
| 


Confidence and Achievement in Eight Branches of Knowledge 25 


100 per cent confidence or complete certainty, no possibility of error. Number 
1 represents 0 per cent or no confidence at all, a pure guess. Number 2 repre- 
sents 25 per cent confidence, Number 3 50 per cent and Number 4 75 per cent. 
Show the confidence you feel in each choice by encircling the appropriate 
number. 


The time required for each test of thirty items was approximately 
fifteen minutes. Tests 1, 2 and 3 were given on a Monday, Tests 4, 5 
and 6 on the following Wednesday and Tests 7 and 8 on the following 
Friday. 

On the first day five minutes’ practice was given by having the sub- 
jects compare lengths of lines and guess the population of cities, report- 
ing their confidence in accordance with the above instructions. Such 
practice forces each subject to construct a scale of confidence before 
undertaking the tests and probably increases the reliability of the first 


test. 
RESULTS 


It is an assumption of the present technique that confidence in a 
judgment is a continuous quantitative psychological variable amenable 
to statistical treatment for the determination of individual differences. 
The confidence reports were, therefore, treated as scores, averaged, etc. 


TaBLe I.—SraTistTics FOR CONFIDENCE ON E1Gut TEstTs oF VOCABULARY 


Test Mean | sp | Retability 
coefficient 


Table I shows the means, sigmas, and reliabilities for ninety-six subjects 
on each of the eight tests. The reliabilities were computed by the odd- 
even method, with the aid of the Spearman-Brown formula. These 
high reliability coefficients give empirical justification to the quantita- 
tive treatment of the confidence reports.., 


26 The Journal of Educational Psychology 


The mean confidence scores should be interpreted in the light of the 
instructions given to the subjects. They represent positions on a 
scale of 0 to 1.00, but in this report the decimal points are omitted. 


TaBLe II.—Sratistics FOR ACHIEVEMENT ON Tests oF VocABULARY 


Test Mean | sp | Betiability 

coefficient 


Table II presents similar statistics for the vocabulary or achieve- 
ment scores. The reliability coefficients are those reported by the 
author of the test (5). They were obtained by correlating scores on 
two equivalent forms. 

One of the chief problems of this investigation is the relation 
between the confidence scores and the achievement scores. Are the 
subjects who show greater achievement more confident of their achieve- 
ment? These correlations are shown in Table III. Correlations 


above .24 are statistically significant. 


TaBLeE III.—CorRELATIONS BETWEEN CONFIDENCE AND ACHIEVEMENT ON EIGHT 
VocaBULARY TESTS 


Test Raw | Corrected 


These correlations indicate that, if there is a relation between 
confidence and achievement, it is an unstable relation which appears 
only under special conditions. Three of the r’s in Table III are not 


. 
| | 
j 
f 
f 


Confidence and Achievement in Eight Branches of Knowledge 27 


significant, four are low but statistically significant, and one is rather 
high. Looking back at Tables I and II:we note that the sigmas for 
both the confidence and achievement distributions from Test 8 were 
high. This is probably one reason for the relatively high correlation 
between confidence and achievement in the test of knowledge about 
sports. 

Variations in exposure to the material of the tests may have a 
bearing also. Let‘us suppose that the confidence shown in dealing 
with any item depends upon (1) feelings of certainty or uncertainty 
arising directly from the specific item under consideration, and (2) a 
general feeling of being “‘at home” with the content of the test, an 
“atmosphere effect”’ (11) resulting from previous exposure to the 
material of that test. Individual differences in the first factor appear 
to be randomly distributed and unrelated to achievement. And 
individual differences in the second factor would also be randomly dis- 
tributed on the first seven tests since exposure to the academic material 
cf these tests is fairly standard. But exposure to a vocabulary of 
sports is not uniform but special and personal, and one would expect 
that variations in exposure would produce concomitant variations in 
both achievement and the general feeling of familiarity. 

Whatever the reason for the one moderately high correlation, the 
other low ones may be accepted as typical of the usual educational 
situation and show that confidence is not dependent upon achi¢vement 


in any direct way. 


TaBLeE I[V.—CoRRELATIONS AMONG CONFIDENCE Scores on EiGut Tgsts 


1 2 3 4 5 6 7 


1. Human relations.......... 

4. Physical sciences......... .339 | .546 | .477 

5. Biological sciences........ .514 | .498 | .498 | .637 

6. Mathematics............. .343 | .508 | .562 | .621 | .598 

.596 | .609 | .569 | .451 | .600 | .438 
ee .291 | .480 | .557 | .516 | .412 | .461 | .461 


The confidence scores were next transmuted into sigma scores and 


the twenty-eight product-moment correlations among the eight tests 
were computed. These raw correlations appear in Table IV. All 
these correlations are positive, and they are all statistically significant. 


28 The Journal of Educational Psychology 


This is an argument for generality of confidence or—in other terms— 
for the consistency of the personality in respect to this variable. The 
mean correlation is .518. When these r’s are corrected for attenuation 
and averaged, the mean correlation is .568. 

Table IV, as it stands, enables us to conclude that, when an indi- 
vidual is confident in one field of knowledge, he is likely to be confident 
in others. It is instructive, also, to analyze the correlation table in 
more detail by one of the methods of factor analysis. Holzinger’s 
bifactor method is the method of choice since it is devised for dealing 
with correlation tables which show a general factor. Holzinger’s 
method (6) was followed, therefore, and the resulting factor pattern is 


presented in Table V. 


TaBLeE V.—Factor PATYERN—CONFIDENCE 


Factor weights 

Test 
Unrelia- 

General | Beta | Gamma | Specificity bility 
691 | ..... .325 .594 . 253 
Total 3.816 .780 .372 2.317 .715 

Per cent variance........... 47.6 9.8 4.7 29.0 8.9 


After the general and group factors were removed, the residual 
correlations were small. The PE of the residual correlations was .029, 
while the PE of a zero r from ninety-six cases would be .067. 

Our problem is not the usual factor-analysis problem. The present 
study was not designated to discover fundamental factors underlying 
confidence in the various fields sampled. We are concerned chiefly 
with the fact that almost half the variance is accounted for by the 
general factor of confidence. This finding should be of interest to those 
who believe that the personality is a reflection of the specific situation 
in which the person finds himself. 

The reason for the grouping which appears in Table V is not impor- 
tant for present purposes. The mean of the correlations among 


4 
" 
‘ 
| 
4 
Lae 
4 
‘ 
. 
oy 
is 
r 
7 
7 


Confidence and Achievement in Eight Branches of Knowledge 29 


Tests 1, 2,3 and 7 is .609; among Tests 4, 5 and 6 it is .619. These are 
to be compared with the previously mentioned mean correlation of 
.518 for all twenty-eight correlations. The grouping appears to be 
definite, but not of great importance, since the two group factors 
together account for but fourteen and five-tenths per cent of the 


variance. 


TABLE AMONG ACHIEVEMENT Scores ON TEsts 


. 1 2 3 4 5 6 7 
1. Human relations.......... 
. 392 
3. Govermment............. 477 | .422 
4. Physical sciences......... .299 | .261 | .349 
5. Biological sciences........ .403 | .238 | .378 | .294 
6. Mathematics............. .291 | .219 | .313 | .200 | .259 
danas .188 | .123 | .175 | .057 | .249 | .079 
.176 | .235 | .196 | .339 | .186 | .276 .190 


The achievement scores were also transmuted into sigma scores and 
the twenty-eight correlations among the eight tests were computed. 
These are presented in Table VI. They show general, but not close, 
agreement with those reported by the author of the test (5). The 
mean of these correlations is .259. A factor analysis, according to 
Holzinger’s method, was performed on this table in order to determine 
the importance of the general vocabulary factor. The bifactor analy- 
sis disclosed no group factors—even when the factor pattern suggested 
by Table V was assumed—hence Table VII shows a two-factor pattern. 

From a comparison of Tables IV and VI it is apparent that the 
correlations among the confidence scores are higher than the correla- 
tions among the achievement scores. And from a comparison of 
Tables V and VII we see that the general confidence factor is statis- 
tically more important than the general achievement factor. It is 
quite likely that some of the variance which the bifactor method allots 
to specificity would, by some of the other methods of factor analysis, 
be allotted to one or more group factors. This is not our problem at 
the moment, however. We merely wish to compare the importance 
of the general achievement factor with the importance of the general 
confidence factor and it is the writer’s belief that Holzinger’s bifactor 
method, yielding Tables V and VII, makes that comparison possible. 


30 The Journal of Educational Psychology 


It is also interesting to know whether the correlation for confidence 
on any two tests will be high when the correlation for achievement on 
the two tests is relatively high. So the rank-difference correlation 
between the confidence correlations in Table IV and the achievement 
correlations in Table VI was computed, and found to .20 + .13. Evi- 
dently the confidence correlations are not dependent upon the achieve- 
ment correlations in any significant way. 


VII.—Factror PATtTERN—ACHIEVEMENT 


Factor weights 
Test 
General Specificity | Unreliability 

2.177 4.519 3.176 
Per cent variance...............-. 27.2 56.4 16.4 


Because of the generality among the eight tests of confidence, as 
shown in Tables IV and V, it is permissible to combine them to get a 
score for confidence in general. The sigma scores were used for this 
purpose. Some correlations between this general confidence and 
other variables are shown in Table VIII. Correlations above .24 are 
significant. 


TaBLE VIII.—CoNFIDENCE AND OTHER VARIABLES 


Confidence and total .25 
Confidence and intelligence (V = . 22 
Confidence and intelligence with achievement partialled out... .11 
.03 
Confidence and Bernreuter F1-C (N = 81)................... 13 


The correlation between general confidence and total achieve- 
ment supplements the correlations between confidence and achieve- 
ment on the separate tests. This correlation of .25 is barely significant 
and cannot be taken very seriously. This is in agreement with 
Greene’s results (3), and with Trow’s (12). 


‘ 
ty 
i 

{ 


Confidence and Achievement in Eight Branches of Knowledge 31 


The intelligence scores were scores on the Henmon-Nelson intelli- 
gence test for college students. Sophomores were excluded because 
their scores were a year old, and scores for some other subjects were 
not available, hence the correlation is based on eighty-six cases. The 
relation between confidence and intelligence, like that between con- 
fidence and achievement, is on the borderline of statistical significance. 

Greene’s results, on somewhat different material, were similar. 
He had confidence reported in terms of A, B, and C. He summarizes 
his results as follows: ‘‘By comparing school success, Thorndike tests 
(for intelligence), and success on the test in question with the total 
number of A’s, we find a nearly chance correlation throughout” 
(3, p. 477). 

Trow, however, found moderately high negative correlations 
between confidence and intelligence. For confidence and Army 
Alpha, r = —.42 + .14; for confidence and Thorndike intelligence, 
r= —.56 + .12 (12, p. 31). Trow’s correlations are based on only 
fifteen cases and the PE’s are high. The differences between Trow’s 
results and those of the present study are probably due to sampling 
differences and differences in the material of the tests. 

It is safe to conclude that the more intelligent college students are 
not very likely to be more confident than the average; they may be less. 

The correlation between confidence and height is low and cannot be 
taken seriously. The health index is a rating made by the examining 
physician when the entering students are given their routine physical 
examination.* Ten points each are allowed for ten sections of the 
examination, e.g. teeth, heart, lungs. The totals ranged between 83 
and 95. The weighting might be questioned but the reliability of the 
ratings as they stand is probably high. It is possible, of course, that 
confidence is affected by such physical variables as height and health, 
but the relationship is not a rectilinear one, nor one that can be seen 
by inspection of the scatter plots. 

Flanagan’s factor analysis (2) of the Bernreuter Personality Inven- 
tory yielded two factors, one of which he tentatively labeled ‘‘self- 
confidence.’”’ Bernreuter scores were available for eighty-one of our 
group of subjects so it seemed worth while to investigate the relation 
between general confidence as treated in the present study and Flana- 
gan’s ‘“‘self-confidence,”’ or F1-C, as Bernreuter calls it. As Table VIII 


* These ratings were made available through the kindness of Dr. Morris, 
Director of Student Health Service. 


32 The Journal of Educational Psychology 


shows, this correlation was .13, not significantly greater than zero. 
Correction for attenuation would increase this correlation, but not 
sufficiently to make it significant. 

Closer examination of these variables reveals no strong reason why 
they should be correlated—other than the similarity in the names. 
The variable studied in the present investigation is confidence in a 
judgment, reported after the judgment has been made. Flanagan’s 
Factor One has high weightings in such questions as ‘‘Do you blush 
very often?” and “‘ Are you troubled with shyness?” The trait, confi- 
dence in a judgment, developed out of psychophysics and is defined 
by the subjects quite unambiguously in terms of the instructions. The 
Bernreuter items are based on four personality inventories, and 
Flanagan’s “self-confidence” is a tentative name for a statistically 
defined trait extracted from one hundred twenty-five items by factor 
analysis. Lack of relationship between these two traits is, therefore, 
understandable. 

It was shown above that there is not much relation between 
individual differences in confidence and in actual achievement. The 
relation between confidence and achievement in the individual can 
profitably be examined by considering the records of a few individual 
subjects. Will one’s confidence be higher in one’s stronger fields? 
The question can be attacked by calculating for any subject the rank- 
difference correlation between his confidence scores and his achieve- 
ment scores on the eight tests. Since one correlation of only eight 
tests is of questionable value, correlations were computed for the 
first sixteen subjects taken in alphabetical order. 

In computing these correlations the question immediately arises 
whether to use raw scores or sigma scores. Is a person’s confidence 
determined more by the absolute number of items which give rise to 
doubt, or by an estimate of his relative standing in his group? And 
does a person express his confidence in direct reference to his feeling 
of uncertainty or in reference to a standard of certainty which he 
believes typical? An empirical approach to these questions was fol- 
lowed. Four rank-difference correlations for each of the first sixteen 
subjects were computed: (1) Between the two series of raw scores, 
(2) between the two series of sigma scores, (3) between raw confidence 
and sigma achievement, and (4) between sigma confidence and raw 
achievement. The means of these correlations are presented in 
Table IX, together with critical ratios of the differences between the 
highest and the other three. 


4 


Confidence and Achievement in Eight Branches of Knowledge 33 


If we take the first mean correlation as significant, we see that the 
relation between the individual’s confidence and his achievement on 
the eight tasks is not so close as one might expect. And it seems from 
these means that the subjects tended to express their confidence in 
absolute terms in reference to the absolute number of items on which 
they were successful. But the means shown in Table [X disguise the 
variability. The variability can be seen by noting the four correlations, 
following the order«of Table IX, for each of four subjects: (Ar) .74, .14, 
.29, .21; (Br) —.26, .54, —.37, .04; (Co) .89, .64, .50, .75; (Cr) .24, 
—.07, —.02, —.20. Some of this variability may be due to the sub- 
jects’ errors in judging and reporting confidence, and some may be due 
to the inaccuracy of the rank-difference correlations, so the means have 
some value. But it also seems that there are differences in the way 
the subjects accept the task, or in the habitual modes of thought of 
the subjects. It may be that some subjects, e.g. Br, habitually com- 
pare themselves to others—or did so on this test, at least—while other 
subjects, e.g. Ar, ignore their classmates and concentrate on the test 
material. The present technique is not adequate for a definitive 
investigation of these possibilities. 

TaBLe IX.—Means or SixTEEN CORRELATIONS BETWEEN CONFIDENCE AND 
ACHIEVEMENT OBTAINED IN Four Ways 


Variables Rho | D/ep 
1. Raw confidence and raw achievement...................... 46 
2. Sigma confidence and sigma achievement.................. .39 1.5 
3. Raw confidence and sigma achievement.................... . 26 3.3 
4. Sigma confidence and raw achievement.................... 21 4.6 
DISCUSSION 


The question of a direct relation between individual differences in 
confidence and in achievement must be answered in the negative. 
Since there is no question of the reliability of either series of scores, 
this negative conclusion can be accepted. In a previous study by 
the writer (8) it developed that confidence depends in part upon the 
nature of the material. In judging lengths of lines, for example, 
accuracy was high but the subjects were not confident and had to be 
encouraged to guess. In recognition of small geometrical designs, on 
the contrary, accuracy was little better than chance but confidence 
was high. All tests in the present investigation were similar with the 
exception of Test 8, dealing with sports. It seems likely, therefore, 


34 The Journal of Educational Psychology 


that the relation between confidence and achievement which obtains 
for Test 8 is an indirect relation—resulting, perhaps, from larger 
individual differences in exposure to the material of this test. And it is 
likely that tests could be found, or constructed, in which such a rela- 
tionship would be large. The low correlations between confidence and 
achievement on the other seven tests are probably typical of the usual 
academic material, material which the student has to study. It is 
noteworthy that none of these correlations are negative. It is ques- 
tionable whether negative correlations between confidence and achieve- 
ment could be obtained, even by careful selection of the material of 
the tests. 

The question of generality of confidence in a judgment must be 
answered in the affirmative. All correlations among the confidence 
scores are positive and statistically significant. The mean of the 
correlation coefficients is .518 (raw) or .568 (corrected). From the 
factor analysis we see that almost half the variance in the correlation 
table is due to the general factor of confidence. In the above-men- 
tioned study by the writer the mean of the correlations for mean confi- 
dence on four kinds of judgment was .58 (raw) or .62 (corrected). 
Jersild (7) reported correlations of .75 and .79 between the confidence 
scores on different tests of psychology. In a study of attitudes the 
present writer (9) got mean confidence scores for confidence in dealing 
with the statements on two Thurstone scales: Of attitude toward war, 
and of attitude toward censorship. The confidence scores from the 
two scales correlated .72 (raw) or .84 (corrected). Since it has been 
shown that confidence is not closely related to achievement, it is not 
necessary to stress the point that the confidence correlations are not 
limited by the achievement correlations. Taking all this evidence 
together one is led to the conclusion that confidence in a judgment is 
a pervasive characteristic of the individual. We shall not here bring 
up general problems of the nature of personality traits. Present 
opinion on these problems has recently been treated by Allport (1) and 
by Newcomb (10). In the writer’s opinion confidence in a judgment 
fits in with modern conceptions of a personality trait. 

Granting, then, that confidence in a judgment may be considered a 
personality trait, how is it related to other variables? The present 
study is not enlightening in this respect, except in a negative way, as 
shown in Table VIII. This is an interesting problem for the future. 
It will suffice here to note that reliable scores for confidence in one 
field of knowledge can be obtained in about fifteen minutes. By 


4 
4 


Confidence and Achievement in Eight Branches of Knowledge 35 


proper selection of the items scores for general confidence could prob- 
ably be obtained in twenty to twenty-five minutes. 

What determines the individual’s confidence in judging any given 
item? One determining factor, we have shown, is the individual’s 
characteristic level of confidence. Another factor, one would suspect, 
is the individual’s estimate of the difficulty of the item to him, which 
may or may not be closely related to the correctness of his response 
to the item. It is légical to suppose, from the discussion of Table [X, 
that in estimating the difficulty of the item and, consequently, in 
reporting his confidence in dealing with it, the subject is influenced by 
(1) the impression made by the item immediately before him, and 
(2) previous experiences with items of the same category, which give 
rise, justifiably or not, to an atmosphere of certainty or uncertainty. 
The expression of confidence is a judgment in “absolute” units and 
may be influenced by the global or “‘at:nosphere”’ effects which Sells 
has found in judgment of syllogisms (11) as well as by the immediate 
data. 


SUMMARY 


Scores for achievement in eight branches of knowledge were 
obtained from ninety-six unselected college freshmen who took the 
Michigan Vocabulary Profile Test. After judging each item the sub- 
jects reported their confidence on a five-point scale from 0 to 1.00. 
These eight tests of confidence have odd-even reliabilities of .85 to 
.96. Correlations between confidence scores and achievement scores 
are positive but low, with one exception. The correlation between 
confidence and achievement in the test on sports is .67 (raw) or .74 
(corrected). For the whole battery the correlation between confidence 
and achievement is .25 + .06. : 

Correlations among the eight tests of confidence range from .29 to 
.72; the mean is .52. Factor analysis (Holzinger’s method) of these 
twenty-eight correlations allocates the variance as follows: The general 
factor 47.6 per cent, two group factors 14.5 per cent, specificity 29 per 
cent, unreliability 8.9 per cent. 

Examination of the records of a few individual subjects disclosed 
that they were in general more confident on those tests on which they 
did well, but the correlations are quite low. 

The conclusion is reached that the chief determinant of confidence 
on these tests is the individual’s general feeling of confidence, ¢.e. a 


personality trait. 


| 


36 The Journal of Educational Psychology 


BIBLIOGRAPHY 


(1) Allport, Gordon W.: Personality. New York: Holt, 1937. 
(2) Flanagan, John C.: Factor Analysis in the Study of the Personality. Stanford 
a University Press, 1935. 
(3) Greene, Edward B.: ‘Achievement and Confidence on True-False Tests of 
College Students.” Journal of Abnormal and Social Psychology, Vol. 
xxi, 1929, pp. 467-478. 
(4) Greene, Edward B.: ‘‘Michigan Vocabulary Profile.” Journal of Higher 
Education, Vol. tx, 1938, pp. 383-389. 
(5) Greene, Edward B.: Michigan Vocabulary Profile Test. Yonkers, N. Y.: 
World Book Co., 1939. 
(6) Holzinger, Karl J.: Student Manual of Factor Analysis. Chicago: Depart- 
ment of Education, University of Chicago, 1937. 
(7) Jersild, Arthur: ‘Determinants of Confidence.” American Journal of 
Psychology, Vol. ux1, 1929, pp. 640-642. 
(8) Johnson, Donald M.: “Confidence and Speed in the Two-Category Judg- 
ment.” Archives of Psychology, New York, 1939, No. 241, p. 52. 
(9) Johnson, Donald M.: “Confidence and the Expression of Opinion.’’ Journal 
of Social Psychology, Vol. x11, 1940, pp. 213-220. 
(10) Murphy, Gardner, Murphy, Lois B. and Newcomb, Theodore M.: Ezperi- 
mental Social Psychology. New York: Harpers, 1937. 
(11) Sells, Saul B.: “The Atmosphere Effect.” Archives of Psychology, New York, 
1936, No. 200, p. 72. 
(12) Trow, William C.: ‘‘The Psychology of Confidence.”” Archives of Psychology, 
New York, 1923, No. 67, p. 47. 


f 
4 
} 
4 
q 
4 
| 
| 
| 
| 
\ 
i 
! 


IMMEDIACY OF INTERPOLATION AND AMOUNT 
OF INHIBITION 


F. J. HOULAHAN 
The Catholic University of America 


Although several studies have been devoted to the problem of 
determining the relationship between the amount of time elapsing 
between two learning activities and the amount of resultant retroactive 
inhibition, only two of these?* have dealt with the question in the case 
of elementary-schoo! children. Each of these used disconnected 
meaningful materials (lists of verbs and of nouns) for the learning 
activities. 

The study being reported here was undertaken to solve some of 
the problems left unsettled by these latter two. The experiment was 
planned in such a way as to be quite comparable with them. It 
differs from them primarily in the control of the amounts of learning 
involved for both the original and the interpolated activity. For in 
the present investigation the amounts of learning are measured by 
both the amount of time allotted to study and a pretest of the learning 
accomplished in the case of equated learning materials, whereas in the 
studies referred to only the duration of the learning periods was held 


constant. 
MATERIALS USED 


The materials used for this study were two lists of twenty-five 
simple verbs, called “‘ Verbs I’’ and “‘ Verbs II,’’ and two lists of forty 
simple concrete nouns, designated as ‘‘Nouns I” and “Nouns II.” 
The Verbs I and Nouns | lists were the same as those used in the studies 
referred to above and in those of Foran* and of Lahey.’ The selection 
of words for the Verbs II and Nouns II lists was made by Baird,' whose 
study showed that these lists were equal in difficulty with the Verbs I 
and Nouns I lists, respectively, for pupils in the elementary schools 
of the Dubuque public school system. 


PLAN OF THE EXPERIMENT 


Preliminary to the experiment proper the subjects were given a 
five-minute period for the study of Verbs I. This was followed at once 
by four minutes devoted to written recall of the words learned. Ona 
second day, but likewise previous to the experiment, the same children 

. 37 


38 The Journal of Educational Psychology 


were asked to study the words of Nouns I for four minutes. This 
study period was followed by three minutes of written recall of the 
nouns. 

When these preliminaries had been completed and an interval of 
one day had elapsed, the experiment proper took place. To carry it 
out the experimental population was divided into four groups. One 
of these was used as acontrol group. For it no period for interpolated 
learning was provided between the original learning and the test of 
the learning accomplished. In the other three groups a second learn- 
ing activity took place shortly after this first learning but prior to 
its test. 

Specifically, in Condition 1, the control condition, five minutes 
were devoted to the study of the verbs. These five minutes were 
followed immediately by twenty-one minutes devoted to singing 
familiar songs which was assumed to be “rest” so far as learning activi- 
ties are concerned. Then came four minutes for written recall of the 
verbs from the list. Twenty-four hours later four more minutes were 
given for written recall of these same words from the Verbs II list. 

The plan for those undergoing the experimental conditions was the 
same as that for the control group except that seven minutes of inter- 
polated work intervened somewhere between the study of Verbs II 
and the first recall of these words. In Condition 2 this took place just 
after the completion of the five minutes devoted to the study of Verbs 
II, allowing fourteen minutes of unbroken “‘rest’’ before the recall was 
attempted. In Condition 3 four minutes of ‘‘rest”’ were introduced 
between the completion of the study of the Verbs II list and the begin- 
ning of the interpolated work, whereas only ten minutes remained from 
the end of the interpolated task to the beginning of the written recall. 
For Condition 4 eight minutes elapsed before the interpolation of the 
work activity, so there remained but six minutes for singing from its 
conclusion till the recall was written out. 

One feature of this procedure which calls for explanation at this 
point is the omission of a test of achievement on Verbs II until the 
twenty-one minutes of allotted experimental time had elapsed. This 
omission was planned to avoid the fixation effect which three or four 
minutes of written recall have upon materials just learned or being 
learned. It seemed not unlikely that any special position effect of 
interpolation immediately subsequent to original learning might be 
lost due to this time elapsing in an occupation which is even more 
favorable to subsequent recall than “ rest.’ 


4 a 
4 
@ 


Immediacy of Interpolation and Amount of Inhibition 39 

Moreover, such a test seemed unnecessary since the relative con- 
stancy of the groups in question with respect to their achievement on 
this type of learning was established well enough by their performance 
in the recalls of Verbs I and Nouns IJ. All ratios used in interpreting 
the results are based on the assumption that the four groups are as 
nearly equal in their performance on Verbs II as in their scores on 
Verbs I. Identity in the amounts of this achievement on the different 
verbs lists is not needed for the validity of the results. 


SUBJECTS 


All children of ages eight to fourteen, inclusive, in grades III to 
VIII in eight elementary schools conducted by the Sisters of Saint 


TaBLE I.—RESULTS OF THE EXPERIMENT 


cen 


Relative retroactive 
inhibition after 24 


Condition Condition Condition Condition 
1 2 3 4 
Minutes rest........... Control 0 4 8 
Number of cases........ 265 234 212 277 
Means and standard de- 
viations 
Age, in years......... 11.20 — 1.65)10.98 — 1.86)11.02 — 1.78)11.24 — 1.24 
Verbs I, pretest...... 12.02 — 4.96)12.25 — 4.11/12.45 — 4.08)12.57 — 5.02 
Nouns I, pretest...... 14.18 — 5.76)14.22 — 4.85)14.67 — 6.39|15.69 — 6.03 
Nouns II, interpolated|............ 12.76 — 6.10/12.79 — 6.29)13.89 — 6.85 
Verbs II, 21-minute 
14.58 — 7.08) 6.19 — 4.14) 6.75 — 4.44) 8.75 — 5.31 


5.72 — 4.09] 8.08 — 5.89 


121.30 50.53 54.22 69.61 
113.56 38.53 45.94 64.28 
rt 58.34 55.30 42.61 
66.07 59.55 43.40 


Verbs II, 24-hour re- 
— 7.40) 4.72 — 3.54 
Ratios expressed as per- 
er 
P er 
Ratio Verbs I 2- 
Relative retroactive 
inhibition after 21 


40 The Journal of Educational Psychology 


Francis at LaCrosse, Wisconsin, participated in the experiment. Two 
schools made up each condition group. This was intended to allow for 
errors in administration. None had to be discarded, however, so all 
data reported are based on combinations of two schools. The pupils 
were distributed throughout the Conditions as shown in the second 
row of Table I. 

Although it is known that there are minor sex differences in the 
matter of retroactive inhibition, there seems to be no place for these to 
assert themselves sufficiently to disturb the results in the present 
population. The numbers of boys and girls are quite nearly equal in 
all groups. 

Age, too, is a potent factor in the amount of harm done to retention 
by interpolated learnings, but this seems to be controlled well enough 
according to the information given in the third row of Table I. 

No effort was made to control intelligence, another known factor 
in the amount of inhibition obtained, for it was presumed that the 
deviations in mental age should correspond fairly well with the devia- 
tions in chronological age in groups selected as these were. 


TREATMENT OF THE DATA 


When the experiment had been completed the papers were scored 
and rescored by two groups of graduate students working independ- 
ently of one another. All conflicts in judgment were arbitrated by a 
third group. 

One point credit was given for each word remembered from:the 
appropriate list. Regardless of spelling or penmanship the presump- 
tion was in favor of a word being right. 

When the scores had been obtained the means and standard 
deviations for the various groups were calculated and relative retroac- 
tive inhibition ratios deduced in accordance with the usual formula: 


(per cent retained by control group) 
— (per cent retained by experimental group) 


(per cent retained by control group) 


where “‘ per cent retained by the control group” is the ratio of the mean 
obtained on the first test of Verbs II by the Condition 1 group to the 
mean these children obtained on the pretest on Verbs I. The “per 
cent retained by experimental group”’ was determined in the same way, 
and is, for this experiment, a ratio of the mean on Verbs II to the mean 


on Verbs I. 


Immediacy of Interpolation and Amount of Inhibition 41 


RESULTS 


The relevant statistical information for each of the four conditions 
of the experiment is summarized in Table I. Examination of this 
table shows that the four groups were quite evenly matched in initial 
scores after memorizing lists of verbs and again after memorizing lists 
of nouns. This lends support to the assumption that the groups were 
evenly matched for ‘the memorizing of the Verbs II list. 

It is interesting to note the effect of negative transfer upon the 
scores achieved on Nouns II. The list of nouns used here was of equal 
difficulty with the list called Nouns I. But in all the schools in which 
the learning of Nouns II succeeded that of Verbs II by eight minutes 
or less the means on Nouns II are definitely lower than are those for 
Nouns I. This is contrary to what would be expected on the basis of 
practice effect. The uniformity with which it occurs indicates that it 
is a function of the conditions of the experiment. But since these 
nouns were written down as recalled immediately after studying them, 
there is no possibility that any extraneous work interpolated itself 
between the study and the test. The lower scores can be attributed 
to only one thing then; namely, the temporal proximity to the pre- 
ceding study of the Verbs II list. 


RETENTION 


The amount retained by the control group, Condition 1, after a 
“rest’”’ of twenty-one minutes, as shown by the ratio between Verbs 
II and Verbs I, is indeed far in excess of the assumed original learning. 
Part of this excess may be due to actual reminiscence. The rest is 
probably accounted for by practice effect. 

However, the results are very much different in the case of the 
three experimental conditions. Here the retention ratios after twenty- 
one minutes lie between 50.53 and 69.61 per cent in contrast with a 
ratio of 121.30 for the control group. Retention after twenty-four 
hours is even more disparate, for in the experimental groups the ratios 
range from 38.53 for Condition 2 up to 64.28 for Condition 4, whereas 
the corresponding ratio for Condition 1 is 113.56. 


RETROACTIVE INHIBITION 


The relative retroactive inhibition ratios are reported in the 
bottom rows of the table. The row next to the bottom shows the 


é 
| 


42 The Journal of Educational Psychology 


amount of inhibition occurring when the recall test took place after 
twenty-one minutes. After so short a time it appears that the ratio 
for the two hundred thirty-four children who studied nouns right after 
studying the verbs is 58.34 per cent. For those two hundred twelve 
children who had four minutes of “rest””’ between the completion of the 
study of Verbs II and the taking up of the study of the nouns the 
amount of inhibition is not quite so great, the quotient being 55.30 
per cent. In the case of the two hundred seventy-seven subjects who 
had eight minutes of ‘‘rest’”’ prior to the interpolation of other similar 
work the retroactive inhibition ratio is much less—only 42.61 per cent. 

Before interpreting the results for recall after twenty-four hours it 
is necessary to remember that we are dealing with the same subjects 
who wrote their recalls after twenty-one minutes. There is less likeli- 
hood of further inhibition showing up in their cases than if the recall 
were first practiced at this time. 

Despite this rehearsal there is a large increase in the size of the 
inhibition quotient obtained in the case of the pupils whose original 
learning was succeeded at once by interpolated learning. Their ratio 
is now 66.07 per cent. There is a small increase for the other groups. 
It is surprisingly small, however, in view of the fact that twenty-four 
hours of living must have introduced some new inhibitory factors. 
For the children of Condition 3, with four minutes’ “rest,” the ratio is 
now 59.55. For the Condition 4 group, with eight minutes’ “rest,” 
it is only 43.40. 3 

There was, then, an increase in inhibition quotients for immediate 
interpolation of nearly eight percentage points. For the group with 
interpolation after four minutes the increase in ratios was only four 
points, while for the group with eight minutes of ‘‘rest”’ the quotient 
after twenty-four hours is only one point higher than it was after 
twenty-one minutes. 

Moreover, there is good reason to believe that the amount of 
inhibition after twenty-four hours would have been much greater had 
not the recall taken place after twenty-one minutes. Such, at least, 
was found to be true in the analogous case of the two studies previously 
referred to. For convenience in discussing this point these studies 
will be designated as the Dubuque‘ and Nauvoo? experiments, thereby 
distinguishing them from the present study, the data for which were 
obtained in LaCrosse. 

In the first of these investigations over four thousand children 
underwent experimental conditions as similar as possible to those to 


BP 

| 

f 


Immediacy of Interpolation and Amount of Inhibition 43 


which the LaCrosse pupils submitted except that, instead of establish- 
ing by means of a pretest the amount of learning accomplished, the 
Dubuque subjects wrote out a recall during the four minutes immedi- 
ately succeeding the study of the verbs. The Nauvoo study was just 
like the one at Dubuque except that the recall after twenty-one 
minutes was omitted. 

The average of the retroactive inhibition quotients for all experi- 
mental conditions at the end of twenty-four hours was reported as 
28.73 for the Dubuque study. The corresponding figure for Nauvoo 
was 36.47. For LaCrosse, where the original learning was not stamped 
in by immediate recall, it was much higher, viz., 56.34. In other words, 
the recall after twenty-one minutes at Dubuque had arrested the 
inhibitory effect of the interpolated work to some quite notable extent. 
There seems to be every reason for believing that the same result 
occurred at LaCrosse where, also, recall was practiced after twenty-one 
minutes. 


SUMMARY AND CONCLUSIONS 


The results of this experiment warrant the conclusions that, in the 
case of elementary-school children studying disconnected meaningful 
materials (verb lists), when a second learning activity similar in nature 
to the first but involving no identical elements (lists of nouns) is 
introduced either immediately after completion of study of the first 
material or after four or eight minutes of comparative ‘‘rest”” have 
elapsed, and when memory for the original learning is scored on the 
basis of written recall after twenty-one minutes and after twenty-four 
hours have elapsed: 

(1) There is a tendency for the first learning to inhibit the second 
learning. This is frequently called negative transfer’ or proactive 
inhibition.® 

(2) The second learning inhibits the retention of the first studied 
material. This is called retroactive inhibition. 

(3) The amount of retroactive inhibition is a function of the time 
elapsing before interpolation of the second activity. 

(4) The sooner the interpolated learning is introduced after the 
first learning the greater is the amount of inhibition. 

(5) The amount of retroactive inhibition is a function of the time 
elapsing before recall. 

(6) The amount of the temporal relation described above (Con- 
clusion 4) is accentuated when the subjects attempt a second recall 


a 


| 
| 


44 The Journal of Educational Psychology 


after twenty-four hours. Much more inhibition shows up for immedi- 
ate interpolation than for interpolation after eight minutes. 

(7) Rehearsal after twenty-one minutes almost completely 
arrested the effect of inhibition for those subjects who “‘rested”’ for 
eight minutes, so that the inhibition ratio after twenty-four hours was 
found to be approximately the same as after twenty-one minutes. 


REFERENCES 


(1) Baird, W.: Equated Word Lists for Memory Experiments. Unpublished M. A. 
thesis, 1940, Catholic University of America. 

(2) Darham, Sr. M. Rose: Retroactive Inhibition: The Effect of Variations in 
Temporal Position of Interpolated Learning on Delayed Recall. Unpublished 
M. A. thesis, 1940, Catholic University of America. 

(3) Foran, T. G.: “Retroactive Inhibition in Relation to Age and the Nature of 
the Interpolated Task.” J. Educ. Psychol., Vol. xxviu, 1937, pp. 451-460. 

(4) Houlahan, F. J.: “ Retroactive Inhibition as Affected by the Temporal Position 
of Interpolated Learning Activities in Elementary-school Children.” 
Cath. Univ. of Amer. Educ. Res. Monog., Vol. x, 1937, No. 3, pp. 27. 

(5) Lahey, Sr. M. Florence Louise: ‘‘ Retroactive Inhibition as a Function of Age, 
Intelligence, and the Duration of the Interpolated Activity.”” Cath. Univ, 
of Amer. Educ. Res. Monog., Vol. x, 1937, No. 2, pp. 93. 

(6) Matousek, Sr. M. Adelbert: “‘Reproductive and Retroactive Inhibition as a 
Function of Similarity in the Recall and Recognition of Paired Associates.” 
Cath. Univ. of Amer. Educ. Res. Monog., Vol. x11, 1939, No. 1, pp. 42. 

(7) Webb, L. W.: “Transfer of Training and Retroaction. A Comparative 
Study.” Psychol. Monog., Vol. xx1v, 1917, No. 104, pp. 90. 


rs 


STUDIES IN THE PSYCHOLOGY OF MEMORIZING 
PIANO MUSIC. IV. THE EFFECT OF INCENTIVE* 


GRACE RUBIN-RABSON 
New York City 


During the course of several previous experiments in memorizing 
piano music even the most capable subjects lapsed into seemingly arid 
areas in the learning. The reasons generally advanced for these lulls 
are simple and practical: Fatigue, lack of interest and attention, 
absence of the “will to do.”’ Book! sums up the matter when he 
asserts that less effort is actually put into the work at all those stages 
where little or no improvement is made. 

That these areas occurred after only three or four repetitions of an 
eight-measure period, however, obviates the possibility of fatigue. 
In an experimental situation in which the subjects are all professional 
pianists and, therefore, concerned to maintain prestige, interest and 
“‘will to learn” are assured. 

The present experiment was designed to investigate whether the 
brief “plateaus” which appear in the complex organization of perfect 
memorized performance are flat only because instruments of measure- 
ment are crude, or whether they can be eliminated or reduced by proper 
incentives. 

To date, studies in motivation are meager. Work with humans is 
largely confined to schoolroom learning situations and knowledge of 
results is the most frequent spur to increased production. 

Chapman and Feder? conclude that incentive operates only when 
the task is long enough; that for very short periods production is as 
high without motivation. Pertinent to this observation is Kitson’s* 
hypothesis that men of long experience in a given trade, under the 
stimulus of added bonuses can improve output only after considerable 
time, since it is not mere speedup or greater expenditure of effort but 
actual learning of more efficient procedures which is stepping up 
production. According to Leuba,‘ not only will social conditions vary 
the animating power of an incentive, but this power is greater among 
those for whom the work as such has the least stimulating value. 


THE PROBLEM 


For the nine subjects coéperating in this experiment, the work 
itself had unusual personal value. The memorizing techniques had 


* Thanks are due Sophie Rabinovitch for a grant-in-aid. 
45 


46 The Journal of Educational Psychology 


for them a professional interest as pianists and teachers; in addition, 
their speed in learning in the experimental situation constituted an_ 
exhibition of skill and capacity. 

It may be assumed, then, that the customary causes of periods of 
unfruitful learning do not exist in the present instance. The problem 
may be stated as follows: Can the apparently sterile trials interposed 
between more obviously fertile ones which occur in the learning of 
efficient and eager subjects be eliminated and the learning speeded-up 
and intensified by verbal exhortation and cash incentives? Will the 
speed-up and intensification improve retention and strengthen the 
clarity and accuracy of a transcription of the memorized material? 


THE EXPERIMENT* 


Methods.—Three kinds of learning are compared. In the first, A, 
the subject studies the material silently for six minutes,f then brings 
it to perfect memorized keyboard performance. There is no urging 
and he works at his own rate of speed. In the second, B, before the 
six-minute study period, he is urged to work at his maximum speed 
and intensity and warned that this is of the greatest importance. The 
learning then continues as before. In the third, C, he is told before 
the study period that, for the balance of the learning, he will be paid 
in proportion to his speed, 7.e., the greater the speed, the greater the 
cash remuneration.{ Not until the relearnings were completed two 
weeks later did the subject have any idea of his success§ so that no 


* The experiment was conducted in the studio of the experimenter in New 
York City between June and September, 1939. 

+ For the value of silent study preliminary to keyboard trials see reference 7. 

tA word is necessary here concerning the remuneration and its possible 
affective value on the incentive. For previous coéperation, six of the nine sub- 
jects had each received an attractive sum regardless of ability or speed, and 
expected the same for this. The three new subjects recommended by these six 
were led, naturally, to expect the same. When, for the C method, this amount 
seemed to be threatened, a definite emotional reaction occurred. Both reward 
and punishment seem to be operative; the reward consisting of the maintenance 
of the sum; the punishment, the fear of losing it. 

§ Though the amount of the reward or punishment was not announced, every 
subject immediately assumed that it would be large, and that he was therefore 
due to lose most of his remuneration. Even the most capable subjects exhibited 
this curious lack of self-confidence. Thorndike and Forlano" found that so great 
an increment as eight-tenths of a cent produced sufficient excitement and strain to 
reduce learning efficiency to below the four-tenths-of-a-cent level. (Fortunately 
the present subjects were not so labile, though most of them observed that the 


incentive was slowing them up.) 


bog 

i} 

x F 

5 


Psychology of Memorizing Piano Music 47 


knowledge of results could produce that ‘‘relevant satisfying after 
effect (which) strengthens very greatly the connection which it 
immediately follows and to which it belongs but (it) also strengthens 
to a much less extent punished connections which are in close enough 
proximity to it.’’!? 

Experimental Design.—The Latin-Greek square is admirably 
adapted to a situation in which neither the subjects nor the experi- 
mental materials can be equated. The nine subjects, divided into 
three groups of three each, performed the experiment three times, 
making a total of twenty-seven learnings by each of the three methods. 
Each group of three followed a smaller Latin-Greek square, and the 
three groups together formed a large one. In the present instance, 
however, the methods themselves were not rotated but presented in 
the same order to all subjects to preserve the cumulative effects of 
least to greatest incentive. 

Subjects.—The nine subjects, age twenty to twenty-six years, 
comprised eight women and one man. From Table I it can be seen 
that the amount of their musical training places them all in either the 
professional musician or highly skilled amateur class. This is true 
as well of the experimental achievement which is high but by no means 
equal. 

Experimental Materials —Nine complete musical compositions, 
ranging from eight to fourteen measures in length and from grades one 
to three in difficulty, were arranged and adapted from unfamiliar 
piano music by Nardini, Loeilly, Martini, Sacchini, Matielli, Rameau, 
and Zipoli. Table I indicates the range of difficulty. 


TaBLeE I.—MusicaL BACKGROUND AND EXPERIMENTAL ACHIEVEMENT OF THE 
NINE SuBJECTS AND THE RANGE OF THE NINE EXPERIMENTAL COMPOSITIONS 


Mean Range 

Subjects: 

12. 9 -19 

Theoretical training 16.4 ll -20 

Compositions: 


i 
‘ 
a 
> 
Fy 


48 The Journal of Educational Psychology 


Procedure.—For each learning only one subject and the experi- 
menter were present. Six minutes of silent study and analysis of the 
material preceded the keyboard learning.’ At the keyboard no errors 
of note or rhythm were allowed, each trial proceeded from beginning 
to end of each composition,’ by the coérdinated approach without 
intermediate separate hand trials, and the learning brought to perfect 
memorized performance in one sitting. Each subject learned three 
compositions on each of three successive days, the first three without 
any injunction as to speed, the second three preceded by a verbal 
exhortation that speed was of prime importance,* and the third threet 
animated by the announcement that speed had cash value. f 

Speed of performance was unregulated. Only the number of 
required trials was checked. Two weeks later the compositions were 
relearned in the same order without preliminary study§ and brought 
again to the same standard of perfect memorized performance. After 
this the composition was transcribed from memory. 


ANALYSIS OF DATA 


The effectiveness of the incentive is measured in several ways. 
First are compared the differences in economy of the three learning 
methods according to the needed number of keyboard trials; second, 
a similar comparison of differences in economy of the required number 
of relearning trials as a measure of the efficiency of the three methods 
in insuring retention; third, an evaluation of the relative accuracy of 
a transcription of the musical score after the relearning as an indication 
of the precision and clarity of the images remaining when both the 
overt kinaesthetic and tonal factors have been removed. 


* At this point the subjects exhibited no untoward reaction; a few observed 
mildly, however, that they could probably work no faster. 

+ The writer believes that these subjects were already at a high level of learning 
efficiency and that their learning techniques could not be improved without outside 
aid. Therefore, the length of the motivated learning period would not operate to 
encourage more efficient skills, hence, a relatively short period should be as reveal- 
ing as a longer one. 

t The emotional reaction to this situation betrayed itself in various comments: 
“That’s supposed to be bad pedagogy, isn’t it?” “I guess the money doesn’t 
mean very much to me.”” ‘“‘That depends on your political point of view, doesn’t 
it?” ‘Money isn’t the best form of motivation, is it?’ etc., etc. 

§ This probably accounts for the greater number of trials required for the 
relearning than for the learning. How much actual economy in keyboard trials 
is effected by the preliminary study is discussed in a subsequent report. 


Psychology of Memorizing Piano Music 49 


As a further check on these three measurements the variance is 
analyzed to determine the significance of each of the several variables 
in the experimental situation; namely, methods, subjects, experi- 
mental compositions, order of the learning, and residual error or 
unknown variables. * 

Table II shows the means and standard deviations of the learning 
and relearning trials and of the transcription scores; in the latter, 
since only errors were checked, the lower the score the greater the 


accuracy. 


TaBLeE IJ.—MEANS AND STANDARD DEVIATIONS OF THE LEARNING TRIALS’ 
RELEARNING TRIALS, AND TRANSCRIPTION SCORES, AND THE SIGNIFICANCE OF 
THE DIFFERENCES OF THE MEANS 


Difference of SE of . 
Mean; SD differ- | Ratio 
means 
ence 

Learning trials: 

A—free learning............... 6.74 | 3.78 | A-B, 18 | 1.04 oF 

B—verbal exhortation.......... 6.56 | 3.74 | A-C, 74 1.02 42 

C—cash incentive.............. 6.00 | 3.66 | B-C, 56 | 1.02 .55 
Relearning trials: 

A—free learning............... 6.66 | 3.30 | A-B, 03 80 .04 

B—verbal exhortation.......... 6.63 | 2.46 | A-C, —.38 88 .43 

C—cash incentive.............. 7.04 | 3.08 | B-C, —.4l 76 .53 
Transcription scores: 

A—free learning............... 7.81 | 8.64 | A-B, 96; 2.14 .45 

B—verbal exhortation.......... 6.85 | 6.68 | A-C, 1.15] 2.31 .49 

C—cash incentive.............. 6.66 | 8.04 | B-C, 19 | 2.04 .09 


Learning Trials —The means of the learning trials (Table II) 
show small actual differences which are not statistically reliable, 
though the progressively diminishing tendency in both means and 
standard deviations in favor of the stimulated learnings may indicate 
a trend. There is no trace in the data for C of the confusion and 
unpleasantness manifested during the experiment at this point. 

A closer analysis of the data for the three most capable and the 
three least capable subjects reveals that this steady diminution is 
attributable entirely to the least capable subjects. These improved 
production in B by sixteen and one-half per cent over A, and in C by 


* For a more complete rationale of the analysis of variance in a similar experi- 
ment see reference ’. 


: \ 


50 The Journal of Educational Psychology 


eleven per cent over A, while the most capable diminished production 
by five and one-half per cent in B as compared with A, with no differ- 
ences between A and C. This finding is corroborated by Leuba® who 
maintains that slow (er) workers show more gain proportionally than 
fast ones when an incentive is introduced. 

There are apparently no economies for learning in any of the three 
methods and it may be inferred that every trial, even when seemingly 
ineffectual, is a necessary step, since no amount of will or intense 
attention could reduce the number. 

Relearning Trials—The relearning trials also show small actual 
differences in the means and are again statistically unreliable. The 
small advantage for the C method in the learning trials is reversed. 
Here, however, the trend is due to the most capable subjects who did 
progressively less well through B and C while the least capable did as 
well in C as in A, with some advantage still accruing to B. 

For several reasons discretion warns against interpretation of these 
small relearning differences otherwise than as pure chance: First, the 
relearnings were unmotivated ;* second, the original learnings showed 
only a slight tendency for a few subjects to do better by any of the 
methods; third, a two-week interval might be expected to eliminate 
this small advantage. 

There is little difference in the learning and relearning averages. t 
This can not be interpreted as total lack of retention, but is probably 
imputable to the six-minute study period before each learning which 
was omitted from the relearning. How large a reduction in the learn- 
ing keyboard trials is attributable to this factor can not be determined 
in this situation. 

Transcription Scores.—For the accurate transcription of the mate- 
rial after relearning, no method shows any real advantage. The large 
standard deviations indicate much greater differences among the sub- 


* Valentine,’* studying maze learning in rats, reports that if retention is tested 
by the whole period of relearning without further punishment, there is no reliable 
difference between the retention of what has been learned with and without 
punishment. 

+ The relationships between learning and relearning speed, and these with 
transcription score, piano experience and theoretical training have been presented 
in three previous reports.”*:* The size and direction of the correlations is con- 
sistent enough in all three to obviate the need for further computation in the 
present instance. The relationships between learning, relearning and transcrip- 
tion are always high and positive; low between these and piano experience and 
theoretical training. 


Psychology of Memorizing Piano Music 51 


jects in this capacity than in the more routine ability to memorize 
these relatively simple compositions. 

For the reasons given above in the analysis of the relearning trials, 
no attempt is made to explain the small fluctuations in the means of 
these scores. 

Analysis of Variance.—That the methods were ineffectual appears 
clearly in Table III, where they account for only three per cent of the 
total fluctuation ardund the mean. The ratio of methods to the resid- 
ual error is so much smaller than Fisher’s* one per cent value (that 
value which might be exceeded in random sampling from a homogene- 
ous population once in a hundred trials) that its insignificance is defi- 
nitely established. The effect of order in learning and relearning and 
both residual errors are satisfactorily small, being five per cent or less. 
The dominant variables throughout are the subjects and the experi- 
mental compositions which account for the bulk of the variation. 


TaBLE III.—ANALYsIS OF THE VARIANCE OF THE LEARNING TRIALS, RELEARNING 
TRIALS, AND TRANSCRIPTION Errors, SHOWING THE PERCENTAGE OF THE 
Totat Due To Eacu VARIABLE, AND A COMPARISON OF THE RATIO OF THE 
VARIABLE AND THE RESIDUAL ERROR WITH FISHER’s ONE PER 

VALUE 


Learning Relearning | Transcrip- 
Fisher’s trials trials tion errors 
Variable one per 

cent value 
Ratio Per Ratio Per Ratio Per 
cent cent cent 
2.82 16.0 | 52.1 | 18.1 | 71.0 | 3.1 | 42.0 
4.98 3.3; 0.4] 2.8] 0.2] 3.0 
Compositions........... 2.82 11.8 | 38.0; 3.4] 18.0; 2.9] 39.0 
4.98 1.0} 3.3) 0.4] 2.8 21 3.0 


The range of capacity among these superior subjects and the range 
of difficulty among the simple compositions (Table I) explains the size 
of these percentages. It may be that the preliminary study period 
accounts for the smaller fifty-two per cent for the subjects in the learn- 
ing as compared with the seventy-one per cent for the subjects in the 
relearning when no study period preceded the keyboard trials. 

For the thirty-eight per cent due to compositions in the learning, 
contrasted with only eighteen per cent in the relearning, only a theory 


52 The Journal of Educational Psychology 


can be advanced. In the bringing of more difficult material to per- 
fection many more trials are needed, which, in a measure, guarantee 
better formed hand patterns then do the easier ones. Since hand 
patterns persist when concepts may not, perhaps this accounts 
for the reduction of differences among the compositions during the 
relearning. 

In the transcription score, the sole departure in relative contribu- 
tion is the size of error, thirteen per cent, due, perhaps, to lack of 
confidence and impatience on the part of the subjects resulting in 
carelessness in the writing.* The sources of error can, however, be 
only surmised. 


SUMMARY AND CONCLUSIONS 


“Plateaus” or apparently sterile stretches in learning have been 
attributed by various investigators to various causes, chief among 
which are fatigue, lack of interest and attention, and relaxation of 
the “will to 

It has been observed, however, in the course of several experiments 
in memorizing piano music in which professionally skillful pianists 
coéperated as subjects, that several seemingly unproductive trials are 
interposed between more obviously fruitful ones. Professional interest 
and pride and the desire to establish prestige in the experimental 
situation rule out most of the suggested causes except fatigue. This 
latter, too, becomes invalid since these level areas often occur as early 
as a minute or two after the inception of learning. 

The present experiment was designed to discover the nature of 
these ‘‘plateaus.””’ Could they be eliminated by means of sufficient 
incentive, or were they an inherent part of the learning curve, too 
gentle in slope for measurement? 

Toward this end, three kinds of learning were compared: Free or 
unmotivated except for the factors already existing in the situation; 
spurred by a verbal exhortation to work with maximum speed and 
intensity; urged by the incentive of a cash remuneration in proportion 
to speed. 

Nine subjects of advanced musical accomplishment repeated the 
entire experiment three times using different learning materials each 
time, totalling twenty-seven learnings by each of three methods, or 
eighty-one in all. The nine unfamiliar experimental compositions 


* The subjects found the transcriptions a tedious part of the experiment, and 
personality may play an important part in the results. 


j 
. 


Psychology of Memorizing Piano Music 53 


comprising complete musical wholes ranged from eight to fourteen 
measures in length and were adapted from the piano works of relatively 
obscure Eighteenth Century composers. 

Each keyboard learning was preceded by a six minute study period, 
which was omitted from the relearning situation two weeks later. 
The trials followed the “whole” and “massed” procedures. After 
the relearning, each composition was transcribed from memory as a 
further check on the vividness of the mental concepts. 

The learning shows no reliable differences by any of the three 
methods. The very slight disparity in the means of the methods is 
ascribable to the least capable subjects who evince some small increase 
in production by the two stimulated approaches. The relearning 
trials also reveal no differences. No apparent advantage accrues to 
any method in affording better retention. This is further corroborated 
by the transcription scores whose means coincide at approximately 
the same values. 

It may be concluded that for each individual capacity a minimum 
number of trials is required to bring the learning of a complicated skill 
to a given point of perfection and that this number can apparently 
not be reduced by added incentives. The seemingly sterile trials are 
actually not so, but are an intrinsic part of the learning curve and can, 
therefore, not be eliminated. Interest in the work itself, pride in 
personal production and efficient learning techniques are at least as 
effective as any irrelevant incentives that can be applied externally. 
The most hopeful justification for the latter is their possible stimulus 
value in creating this interest, pride and efficiency. 


BIBLIOGRAPHY 


1. Book, W. F.: Psychology of Skill. Univ. Montana. Pub. in Psych, 1908. 

2. Chapman, J. C. and Feder, R. B.: “‘ Effect of External Incentives on Improve- 
ment.” Jour. Ed. Psych., Vol. vu, 1917, pp. 469-474. 

3. Fisher, R. A.: Statistical Methods for Research Workers. Oliver and Boyd, 
Edinburgh, 1934. 

4. Kitson, H. D.: “A Study of the Output of Workers under a Particular Wage 
Incentive.”” Univ. Journal of Business, Vol. 1, 1922, pp. 54-68. 

5. Leuba, C.: “Measurement of Incentives and their Effect: a Contribution to 
Methodology and Orientation Resulting from the Experimental Use of 
Incentives.” Jour. Soc. Psych., Vol. 11, 1932, pp. 107-114. 

6. Leuba, C.: “A Preliminary Experiment to Quantify an Incentive and Its 
Effects.” Jour. Abn. and Soc. Psych., Vol. xxv, 1930, pp. 275-288. 

7. Rubin-Rabson, G.: “The Influence of Analytical Pre-study in Memorizing 
Piano Music.” Archives of Psych., No. 220, 1937. 


| | 


10. 


11. 


12. 


13. 


The Journal of Educational Psychology 


Rubin-Rabson, G.: ‘Studies in the Psychology of Memorizing Piano Music. 
I. A Comparison of the Unilateral and the Coordinated Approaches.” 
Jour. Ed. Psych., May, 1939. 

Rubin-Rabson, G.: “Studies in the Psychology of Memorizing Piano Music. 
II. A Comparison of Massed and Distributed Learning.” Jour. Ed. Psych., 
April, 1940. | 

Rubin-Rabson, G.: “Studies in the Psychology of Memorizing Piano Music. 
III. A Comparison of the Whole and the Part Approach.” Jour. Ed. 
Psych., September, 1940. 

Thorndike, E. L. and Forlano, G.: “Influence of Increase and Decrease of the 
Amount of Reward upon the Rate of Learning.” Jour. Ed. Psych., Vol. 
xxiv, 1933, pp. 401-411. 

Thorndike, E. L.: Psychology of Wants, Interests, Attitudes. New York: 
Appleton Century Co., 1935, p. 30. 

Valentine, R.: ‘Effects of Punishment for Errors on the Maze Learning of 
Rats.” Jour. Comp. Psych., Vol. x, 1930, pp. 35-53. 


| 54 
8. 
9. 
|_| 
= 


A STUDY OF THE THURSTONE PRIMARY MENTAL 
ABILITIES TESTS APPLIED TO FRESHMAN 
ENGINEERING STUDENTS 


ROBERT G. BERNREUTER AND CHARLES H. GOODMAN 
The Pennsylvania State College 


Professor L. L. Thurstone’s?*:+-5* disclosure of seven primary 
mental abilities by means of his factoring technique and his subsequent 
publication of tests’ designed to measure these abilities has aroused 
widespread interest in the field of mental measurement. The purpose 
of the authors in conducting this investigation was to examine some 
of the practical uses of these tests. 


SUBJECTS 


One hundred seventy freshman engineering students of The 
Pennsylvania State College served as subjects. These students 
volunteered to act as subjects upon the offered incentive that each 
person volunteering to take part would receive privately a psycho- 
graph of his own abilities. A companion study of these one hundred 
seventy freshmen engineers on the basis of the DeCamp psychological 
examination, showed them to be an excellent sampling of the entire 
freshman class of the College. However, the freshman class cannot be 
considered as a random sampling of the general population. Due to 
the limited facilities of the College, only fifty per cent of the high-school 
graduates seeking admission are accepted, and admission is based 
solely upon high-school record and college aptitude tests. The 
students admitted are almost exclusively from the upper forty per cent 
of the high-school population in achievement. This selective process 
has, of course, attenuated in an undetermined amount all of the coeffi- 
cients reported in this study. 


PROCEDURE 


The tests used in this investigation were labelled ‘‘ Experimental 
Edition” and were provided by the American Council on Education. 
The tests are bound in three booklets and are so constructed that they 
can be scored either by hand or by machine. They are designed to 
measure seven primary abilities. These are a perceptual ability, a 
number ability, a verbal ability, a spacial ability, a memory ability, an 
inductive ability, and a reasoning ability. Either two or three sub- 

55 


56 The Journal of Educational Psychology 


tests, as shown in Table I, are used to measure each of these primary 

abilities. 

TaBLE I.—CoMPARISON OF THE RELIABILITIES OF THE SUBTESTS AND PRIMARY 
ABILITIES WITH THOSE REPORTED BY THURSTONE 


Reliability 
Ability Subtests Thurstone’s 
College 
students high-school 
students 

Identical form . 96 
Verbal enumeration .98 

Addition .97 
Multiplication .98 

Cards 97 
Figures 96 

Word number 
Initials .80 

Completion .76 
Same-opposite .95 

Letter grouping .83 
Marks .75 
Number patterns . 96 

Arithmetic .80 
Mechanical movements .82 
Number series 91 


At the close of the first semester, the grades for each of the subjects 
were obtained for the following courses: Chemistry, English composi- 
tion, mathematics, and engineering drawing, as well as the semester 
average, which is a combination of all freshman courses. The scores 
made by these subjects on The Pennsylvania State College Psychologi- 
cal Examination! prepared by Dr. J. E. DeCamp were also recorded. 
The DeCamp test has been carefully standardized and has been used 
for a number of years. In previous comparisons with the more’ 
commonly used college aptitude tests, it has been consistently found 


to be among the best. 


4 


Thurstone Primary Abzilities Tests 57 


RESULTS 


The reliabilities of each of the tests were computed by the split- 
half Spearman-Brown technique. As shown in Table I, they vary 
from .75 to .98 with a median of .93. | 

The reliabilities of the factors were also found by the same tech- 
nique. They varied from .89 to .99. A comparison of the reliabilities 
of the factors as found in this study, shows a close correspondence with 
the reliabilities as found by Thurstone.’ 


TaBLE IJ.—A COMPARISON OF THE FRESHMAN ENGINEERS AND Hype Park 


Hicu Scuoo.t Seniors BASED UPON THE SEVEN PRIMARY ABILITIES 


Ability Group N | Mean | Sigma| Do/D 

ss ened Engineers 170 | 149.50 | 21.20 | —1.17 
Hyde Park | 300 | 152.00 | 25.00 

Engineers 170 | 127.50 | 37.00 1.06 
Hyde Park | 300 | 124.00 | 30.00 

Engineers 170 | 90.10 | 22.90 2.48 
Hyde Park | 300 | 84.00 | 30.00 

Engineers 170 | 139.50 | 32.70 7.20 
Hyde Park | 300 | 114.00 | 40.00 

Engineers 170 | 16.94| 3.04 1.18 
Hyde Park | 16.00| 9.00 

Engineers 170 | 32.41 8.75 | —2.60 
Hyde Park | 300| 34.50| 9.00 

Engineers 170 | 73.91 | 16.10 | 10.36 
Hyde Park | 300} 56.50 | 20.00 


The freshman engineering students were then compared with the 
three hundred Hyde Park High School seniors tested by Thurstone. 
These results are given in Table II. The results shown for the 
Thurstone study are the authors’ own computations, based upon the 
normalized distributions, which are the only ones that Thurstone has 
reported to date. 

It is apparent from an inspection of Table II that the engineers did 
significantly better than the high-school seniors in deductive reasoning, 


58 The Journal of Educational Psychology 


in space, and possibly in verbal ability; and possibly poorer than the 
high school seniors in induction. The dispersion of the scores for 
engineering students was smaller, except for number ability, and was 
especially restricted in the memory factor. 


TaBLeE III.—INTERCORRELATIONS AMONG THE PRIMARY ABILITIES 


P N V S M I D 
41 37 24 
.05 15 17 12 
.29 .29 .21 .39 .10 
.13 17 .29 .24 .03 45 
.30 .30 . 26 30 ll .29 .20 


The intercorrelations among the various abilities were then 
obtained. These results are given in Table III. Several interesting 
facts are immediately apparent. All of the intercorrelations are 
positive. Nevertheless, they are low, ranging from .03 to .45 with a 
median of .24. As judged by median values, memory appears to be 
more independent than any of the other abilities, with deductive 
reasoning next, followed by the verbal ability. It is apparent that the 
goal of completely independent abilities has not been attained in this 
experimental edition. 

Despite the lack of complete independence of the abilities, it is 
evident that they are, nevertheless, relatively independent. This 
becomes more apparent when we consider the relationships existing 
between the various abilities and scholastic success during the first 
semester of the freshman year. These comparisons are shown in 
Table IV. The standard error of the zero-order coefficients in this 
table is .08 when computed by the formula SE = 1/./N. By reading 
this table horizontally, the extent to which each ability is correlated 
with success in chemistry, engineering drawing, English composition, 
and mathematics, and in all of the freshman subjects combined may be 
seen. Deductive reasoning shows the highest correlation, with verbal, 
number, and inductive abilities following. Perceptual ability seems 
not to be correlated with any of these measures of scholastic success, 
nor does memory seem to be of much importance. It is probably 
surprising that space ability did not show higher correlations. 


a 

| 


Thurstone Primary Abilities Tests 59 


By reading Table IV vertically, the extent to which each of the 
subjects, and the semester average, is related to each of the tests may 
be seen. None of the tests correlated with elementary engineering 
drawing, and no multiple correlation was computed for this subject. 
For each of the other three major subjects, a multiple correlation of 


TaBLeE I[V.—TasBLeE SHOWING CORRELATIONS BETWEEN THE PRIMARY ABILITIES 
AND CoLLEGE Cours#s AND MULTIPLE CORRELATIONS WITH COLLEGE CouURSES 


English 
Ability Chemistry | Drawing composi- 
tion 

+.04 + .07 + .00 + .05 +.04 
+ .32 +.27 — .01 + .26 + .27 
+ .33 + .32 +.01 + .44 +.16 
+ .23 +.19 +.11 +.11 + .25 
M. +.10 + .04 +.11 + .23 — .05 
+ .34 + .23 +.18 +.21 + .29 
+ .38 +.41 +.15 +.21 +.44 
+.51 


49 was obtained. It must be noticed, however, that different combi- 
nations of abilities were used in computing these coefficients. A 
multiple correlation of .51 was found between N, V, S, J, and D 
combined and the average mark in all subjects. 


SUMMARY 


This study of the experimental edition of Thurstone’s Primary 
Abilities Tests applied to engineering freshmen seems to justify the 
following conclusions: | 

(1) The tests show sufficient reliability to justify their use in 
making comparisons of individuals on the college level. 

(2) The freshman engineering students differed from the high- 
school seniors in that they were significantly superior in deductive 
reasoning and in space, and were possibly superior in verbal ability 
and possibly inferior in inductive reasoning. 

(3) The low but positive intercorrelations indicate that the tests 
are not entirely pure measures of the primary abilities, but, despite 


60 The Journal of Educational Psychology 


the impurities, they are sufficiently independent for them to possess 
significantly different values in predicting scholastic success. 

(4) One of the primary abilities, perception, correlates so slightly 
with success in the subjects considered in this study that its use in 
predicting success in these subjects is not justified. The same is 
probably true of memory. Spacial ability also seems to be only 
slightly correlated with success during the first semester of the fresh- 
man year. On the other hand, at least four of the primary abilities— 
number, verbal, induction, and reasoning—do correlate sufficiently 
with success to justify their use. 


BIBLIOGRAPHY 


(1) DeCamp, J. E.: The Pennsylvania State College Psychological Examination. 
The Pennsylvania State College Press, State College, Pa., 1938. 

(2) Thurstone, L. L.: ‘Unitary Abilities.” J. Gen. Psychol., Vol. x1, 1934, pp. 
126-132. 

(3) Thurstone, L. L.: ‘'The Factorial Isolation of Primary Abilities.” Psycho- 
metrika, Vol. 1, 1936, pp. 175-182. 

(4) Thurstone, L. L.: “A New Conception of Intelligence.’”’ Educ. Record, 
Vol. xv, 1936, pp. 441-450. 

(5) Thurstone, L. L.: ‘The Isolation of Seven Mental Abilities.”” Psychol. Bull., 
Vol. xxx1u1, 1936, pp. 720-781. 

(6) Thurstone, L. L.: “Primary Mental Abilities.”” Psychometric Monographs, 
No. 1, University of Chicago Press, 1935. 

(7) Thurstone, L. L.: Manual of Instructions for the Tests of Primary Mental 
Abilities. American Council on Education, 1938. 


RELIABILITY OF MULTIPLE-CHOICE MEASURING 
INSTRUMENTS AS A FUNCTION OF THE SPEARMAN- 
BROWN PROPHECY FORMULA, III 


H. H. REMMERS AND EDWIN EWART 
Purdue University 


In his research ‘on attitude measurement, Likert* found that by 
increasing the number of possible responses to each item on a Thurstone 
attitude scale, the reliability of the entire scale was increased. Two 
responses are possible to each item on the original Thurstone scales; 
namely, agreement or disagreement. In effect this means that 
response to each item on the instrument is recorded on a two-point 
scale running from agreement to disagreement. He found that when 
subjects were enabled to respond to each item with one of five different 
choices from complete agreement to complete disagreement, that is, 
when each subject could record his agreement-disagreement with every 
item by means of a five-point rating scale, the reliability of the entire 
instrument was markedly increased. 

As already pointed out it occurred to the senior author that this 
increase in reliability with increase of the number of choices per item 
might be a function of the Spearman-Brown prophecy formula, and 
that the hypothesis might be extended to all multiple choice tests.° 
This study is the third in a series testing this hypothesis. 

The first study by Remmers, Karslake, the Gage® examined the 
validity of this hypothesis in terms of published coefficients obtained 
for a variety of achievement tests under conditions seemingly appro- 
priate to the problem. Their results were equivocal, neither com- 
pletely substantiating nor disproving the hypothesis, although the 
weight of evidence favored it. They present a number of factors which 
might account for the discrepant cases: (1) Computational errors in the 
data examined, (2) varying methods of scoring the tests, particularly in 
regard to correction for “‘chance,” (3) faulty procedure in reducing the 
number of alternative choices, and (4) possible deviations from stand- 
ard procedure in administration of the tests. The necessity for more 
rigid examination of the hypothesis through experimental control of 
these variables was pointed out. 

The first controlled study of this hypothesis was undertaken by 
Denney and Remmers.'! Four forms of multiple-choice vocabulary 


test were constructed using the same word list, and were similar except 
61 | 


62 The Journal of Educational Psychology 


that the number of responses to each word varied from five in form A 
to two in form B. The high-school students of Clinton County, 
Indiana, were divided into four groups equated as to mean IQ and 
standard deviations, and each group was given one form of the test. 

After computing the self-correlation for each form of the test (split- 
test procedure), it was found that the reliability increased as the num- 
ber of responses increased, and that this increased reliability could be 
reliably predicted by the Spearman-Brown formula. 

TABLE 


DIVISION OF EDUCATIONAL REFERENCE 
Experimental Form 4 PURDUE UNIVERSITY 


Name Sex 


4, A SOCIAL STUDY 
(adapted from Manly H. Harper) 
Below are a list of propositions. At the scale at the left of each proposition 


mark your degree of agreement or disagreement with that proposition by encircling 
the appropriate number. 


Piease mark each proposition even if in some cases you feel that you are merely 
guessing. 


7 indicates complete agreement 3 indicates moderate disagreement 
6 indicates strong agreement 2 indicates strong disagreement 

5 indicates moderate agreement 1 indicates complete disagreement 
4 indicates indifference 


7654321 1. In teaching the vital problems of citizenship, teachers should 
so impress on the students the approved opinions in these 
matters that life’s later experiences can never unsettle or 
modify the opinions given. 

7654321 2. If our people were willing to try the experiment fairly the 
government ownership of railroads would be for the best 


interests of the country. 
7654321 3. The practice of democracy, as developed in the United States, 
has no serious or far-reaching defects. 


The present study represents an examination of the validity of this 
hypothesis as it applies to a rather widely used type of attitude 
measurement device exemplified by the Harper Social Study Scale.* 
Four experimental forms of ‘‘A Social Study” by Manly H. Harper 
were used, identical with the original scale as far as items were con- 
cerned, differing only in that agreement-disagreement was to be 
marked on a scale at the left of the item. (See Table I.) Items on 
Form 1 were to be marked on a two-point scale (in all cases the 


Multiple-Choice Measuring, Instruments 63 


numerically high end of the scale indicated agreement). Items on 
Form 2 were marked on a three-point scale, Form 3 a five-point scale, 
and Form 4 a seven-point scale. All forms of the test had identical 
directions except for adaptation to a different number of possible 
responses per item. The directions for seven responses are shown in 
Table I. 

The test was administered to eight hundred eight students in 
beginning psychology courses, four hundred eighty-seven at Purdue 
University, and three hundred twenty-one at the University of 
Nebraska. The four groups were chosen at random, four forms of the 
test being piled in order, and then passed out. There were two 
hundred five students in Group I, two hundred four in Group II, two 
hundred four in Group III and one hundred ninety-five in Group IV. 

In scoring the tests, the scores on the * ‘plus” items (those items 
which answered positively would indicate liberalism) were reversed. 
That is, the scales on these items were turned end for end in scoring, so 
that a plus item on the original scale marked “‘7”’ on Form 4 would be 
scored as a “1,” a “6” becoming a “2,” etc. Thus all items were 
placed on a common scale, the lower end of which indicated a “‘liberal”’ 
answer, the upper end a “conservative” answer. To find the total 
score for a test, the marked scale values for all the items were sum- 
mated. A low total score indicated “liberalism” as measured by the 
test, a high score “‘conservatism.” 

Reliabilities for each form were determined by the split-half 
method, the scores for the odd items and for the even items being 
summated, and self-correlations computed using the Pearson tech- 
nique. The reliability for Form 1 was .627, for Form 2 .683, for Form 3 
.785, and Form 4 .778.* The reliability apparently increases as the 
number of possible responses increases up to five responses. The 
reliabilities for Form 3 (five responses) and for Form 4 (seven responses) 
are practically identical. 

To bear out our hypothesis, this increase up to five responses should 
be predictable, within allowable error, by the Spearman-Brown 
prophecy formula, 


+ (n 1)rz 


where ra equal reliability of A, 


Ta = 


* In our computations, reliabilities for the half-test were used, since there was 
no value in computing the reliability for the whole test. 


. 
. 
. 


64 The Journal of Educational Psychology 


where rz equal reliability of B, and 
ma number of response alternatives in each item of A 


= 


mM, number of response alternatives in each item of B 


Three predictions were made, using the reliabilities of Forms 1, 2, and 3 
respectively as a basis. (See Table II.) Since the hypothesis appar- 
ently begins to break down with the Form 4 (seven responses) no value 
was seen in using this reliability for purposes of prediction. 

Since according to Fisher? the probable errors of correlations over 
.50 tend to be too small, Fisher’s z transformation was used in comput- 
ing the allowable error in the predictions. Since there is no formula 
comparable to Shen’s for estimating the standard error of a predicted 
z, 2’s for the predicted reliabilities were read directly from the table.°® 
However, any correction analagous to Shen’s would increase the 
standard error of the difference between the z’s of the obtained and 
estimated reliabilities. Therefore our procedure errs in the direction 
opposed to our hypothesis. 

Our statistical procedure was as follows: The z transformations for 
the obtained reliabilities and for the estimated reliabilities for each 
form were read from a table.° The differences between obtained 2z’s 
and estimated z’s, and the standard errors of these differences, were 
computed. These values, along with the critical ratios of the differ- 
ences, appear in Table II.* . 

Examination of Table II indicates that all predicted reliabilities for 
Forms 1 to 3 lie within one standard deviation of the obtained reliabili- 
ties. Furthermore, none of the differences between obtained and 
predicted reliabilities for these three forms is statistically significant. 
Reliabilities for each of these three forms were predicted from the other 
two (Table II). Of the six critical ratios of the differences between 
obtained and predicted reliabilities, all lie below 1.00. 

Thus we can conclude that with the instrument and population 
used, the reliability of the attitude scale is increased as the number of 
possible responses to each item is increased up to five responses. Fur- 
thermore, this reliability can be predicted, within allowable error, by 
the Spearman-Brown prophecy formula. 

Interpretation of the data from Form 4 is more hazardous. Al- 
though the obtained reliability of Form 4 was practically identical with 


* Since the standard error of z is o, = 1/+/N — 3, the standard error of the 


difference between two z’s, where N is the same, reduces to oaiss = \ a 


Multiple-Choice Measuring Instruments 65 


the obtained reliability of Form 3, the estimated reliability of Form 4, 
although greater than the obtained reliability, lies within the allowable 
error. The critical ratios of the differences between obtained and 
predicted reliabilities were markedly greater for Form 4, and in one 
case, where the reliability was estimated from Form 1 (Table II), the 
difference approximates statistical significance, having a critical ratio 
of 2.19. 


TaBLE II.—CoMPARISON OF OBTAINED AND PREDICTED RELIABILITIES 


Tobt Tpred Zobt Teor: Zpred diff, | CR 


Reliabilities as predicted from Form 1 


.683 | .716 | .835 | .071 | .899 | .064 .099 .65 
| re .785 | .809 | 1.058 | .071 | 1.124 | .066 | .099 | .67 
Psi cicccasaves .778 | .852 | 1.040 | .072 | 1.263 | .223 | .102 | 2.19 


Reliabilities as predicted from Form 2 


.627| .590 | .736 | .070| .678 .058| .099 | .58 

Ee .785 | .782 | 1.058 | .071 | 1.050 | .008 | .099 | .08 

OS SE eee .778 | .834 | 1.040 | .072 | 1.200 | .160 | .102 | 1.56 
Reliabilities as predicted from Form 3 

.627 | .594 | .736| .070| .683 .053| .099| .53 

OS .683 | .687 | .835 | .071 | .842| .007| .099| .07 

a .778 | .836 | 1.040 | .072 | 1.208 | .168 | .102 | 1.65 


The trend of the data seems to indicate that as the number of 
possible responses to each item increases beyond five, our hypothesis 
breaks down, and the reliability does not increase according to the 
Spearman-Brown formula if at all. Remmers has suggested that the 
point of this breakdown may be a function of the immediate perceptual 
or memory span of the individual. 3 

However, it should be emphasized that this evident breakdown is 
inferred only from a possible trend in the data, and is not a statistically 
valid conclusion. Of the three predicted reliabilities for Form 4, in 
only one case was the difference between the obtained and the predicted 
reliabilities fairly significant (critical ratio 2.19). In the other two 
cases this difference was well within the allowable error, (critical ratios 
1.56 and 1.65). The possibility should be recognized that this differ- 
ence could be due to sampling error, and that the apparent breakdown 


| 


66 The Journal of Educational Psychology 


of the hypothesis as the number of responses per item increases over 
five may be an artifact. 

Further experimental investigation will test this apparent break- 
down. It is planned in the near future to conduct an experiment com- 
paring obtained and predicted reliabilities for three-, five-, seven-, 
nine-, and eleven-response-per-item tests. 


SUMMARY 


The reliability of the Harper Social Study Scale is increased as the 
number of possible responses to each item is increased, up to five 
responses. This increase in reliability can be predicted within allow- 
able error by the Spearman-Brown prophecy formula. 

Greater differences between obtained and predicted reliabilities for 
the seven-response-per-item form, although not statistically significant 
suggest the possible breakdown of the hypothesis as the number of 
responses is increased above five. Further investigation along this 


line is necessary. 


BIBLIOGRAPHY 


1. Denney, Hazen R. and Remmers, H. H.: “ Reliability of Multiple Choice Tests 
as a Function of the Spearman-Brown Formula II.” Journal of Educational 
Psychology, December, 1940, pp. 699-704. 

2. Fisher, R. A.: Statistical Methods for Research Workers. Oliver and Boyd, 
Edinburgh, 1932. 

3. Harper, Manly H.: Social Beliefs and Attitudes of American Educators. Teach- 
ers College, Columbia University, Contributions to Education, 1927. 

4. Likert, R.: “‘A Technique for Measurement of Attitudes.” Archives of Psy- 
chology, No. 140, June, 1932. 

5. Lindquist, E. F.: Statistical Analysis in Educational Research. Houghton 
Mifflin Co. Boston, 1940. 

6. Remmers, H. H., Karslake, Ruth, and Gage, N. L.: “ Reliability of Multiple 
Choice Measuring Instruments as a Function of the Spearman-Brown Proph- 
ecy Formula I.” Journal of Educational Psychology, November, 1940, pp. 


583-590. 


INTERRELATIONS OF VOCABULARY SKILLS: 
COMMONEST VERSUS MULTIPLE MEANINGS* 


GEORGE D. LOVELL 
Northwestern University 


ORIGIN OF THE PROBLEM 


In his book, The Measurement of Intelligence, Terman’ has said that 
in measuring intelligence, the vocabulary test “ ... has far higher 
value than any other single test of the scale.’’t Since the vocabulary 
test has been shown to be closely related to or a central component of 
general intelligence, it should be interesting to determine whether 
knowledge of vocabulary constitutes one single vocabulary skill or a 
group of more specialized skills. If the latter is the case, the interrela- 
tions of these specialized skills with each other and with general intelli- 
gence should be determined. Thus far, the only vocabulary tests 
have been of an extensive nature; 7.e., they have measured the number 
of words for which an individual knows the single commonest meaning. 
There has been little work done toward measuring such vocabulary fac- 
tors as knowledge of multiple meanings, discrimination in choice of 
words, and the relation of general to technical vocabularies. 


OBJECT OF THE EXPERIMENT 


As the first step in the analysis of vocabulary abilities, this experi- 
ment attempts to determine the interrelations of the knowledge of a 
single commonest meaning of a word and additional knowledge of the 
multiple meanings of the same word. 

The Multiple Meanings Test is based on a representative sampling 
of the Funk and Wagnalls unabridged dictionary, taken by Rebert H. 
Seashore and Lois D. Eckerson in connection with their English 
Recognition Vocabulary Test,?* a multiple-choice type of measure of 
the number of words for which an individual knows the commonest 
meaning. These authors have shown the necessity of using an 
unabridged dictionary to determine the full recognition vocabulary, 


* The author wishes to express his appreciation to Dr. Robert H. Seashore for 
his valuable help and criticism in the direction of this study, done at Northwestern 
University during the academic year of 1938-1939, and appreciation to Dr. John W. 
Spargo of the Northwestern English Department for his kind criticisms and sugges- 
tions on technical points. 

+t Terman, The measurement of intelligence, p. 230. 

67 


4 


68 The Journal of Educational Psychology 


averaging around one hundred fifty thousand words for college under- 
graduates. Thus, in comparing single commonest with multiple 
meanings, we have the advantage of two excellent controls: (1) the 
dictionary as a source of meanings and (2) identical lists of representa- 
tive words for the two tests. (A further description of the English 
Recognition Vocabulary Test and a review of the previous studies of 
the problem can be found by referring to Eckerson? and Seashore and 
Eckerson.°) 

This sampling of the dictionary is divided into three parts: Basic 
words (1) common and (2) rare, and (3) derivatives, the basic words 
being those next to the margin in bold-faced type in the dictionary, 
and derivatives being those in lighter type listed under the marginal 
words. Of these basic common words forty-three per cent were found 
to have multiple meanings ranging from two to forty-one in number 
with a mean of 5.28 after rare, archaic, obsolete, slang, and colloquial 
meanings had been excluded. The range of meanings was based on a 
count of only the numbered meanings in Funk and Wagnalls dic- 
tionary; if every possible differentiation within numbered meanings 
were considered, the range would be even greater. Meanings not 
found in Funk and Wagnalls but listed in Webster’s dictionary were 
also used in the Multiple-Meanings Test. Only basic words were 
used because multiple meanings of derivatives are relatively scarce 
and overlap greatly with those of basic words, and more particularly 
because the basic list has been well standardized. 


SELECTION OF TEST CRITERIA 


Seashore?* has found a correlation of .60 between two multiple- 
choice tests measuring basic and derivative words, respectively, and 
using the same sample of words. In a study of basic words Annen? 
found that with different word samples and different test criteria 
(multiple choice recognition, using word in illustrative sentences, defin- 
ing word in subject’s own terms, and merely checking whether the 
word were known or not), the mean number of words known did not 
vary more than eight per cent among these different tests, and that 
intercorrelations between these different and short lists ranged from 
40 to .50. Although the Seashore-Eckerson test of commonest 
meanings employs a recognition (multiple-choice) criterion of knowl- 
edge, our own test of multiple meanings did not do so because of the 
very extensive preliminary work which would be necessary in making 
adequate item analyses of this type of test. Rather a use criterion was 


e 


Interrelations of Vocabulary Skills 69 


employed, since Annen had shown a moderate correlation between all 
of the common criteria of knowledge, even when studying different 
and relatively short lists of words for each criterion. It, therefore, 
seemed reasonable to suppose that in our study, using identical and 
longer lists of words, this error would not be too great to interfere 
with the comparison of commonest versus multiple meanings. This 
seemed especially true since the direction and amount of the error 
could be allowed for in the later comparison. 

If the multiple meanings scale were found to be quite distinct from 
the commonest meaning scale, it would later be possible to construct a 
multiple-choice test as a check, but if the two skills were so closely 
related that one might be estimated from the other in most cases, there 
would be no need for this more expensive form of multiple-meanings 
test. Thus, because of its advantage in simplicity for an exploratory 
problem of this type, a use criterion test was employed in the Multiple- 
Meanings Test; subjects were asked to use the word in question in a 
sentence which would illustrate the desired meaning as suggested in a 
cue meaning. 


EXPERIMENTAL PROCEDURE 


In order to determine the number of multiple meanings for each 
word, certain working criteria of differentiation in meaning had to be 
set up. Dictionary distinctions are often so close as to represent only 
minor differences in shading of meanings. According to our criteria 
in this experiment there is a differentiation in meaning if (1) there is a 
change in part of speech; (2) there is a shift from the concrete to the 
figurative use or vice versa; (3) there is a specific technical usage; (4) 
there is a qualitative difference including a shift in connotation, a 
change in emphasis, a change from an action to a process or & person, 
or a change from one idea to an entirely different one. In the pre- 
liminary form of the test all meanings derived from any of the above 
criteria were used. | 

In order to make sure that the instructions for using the suggested 
meanings in sentences were clear, a number of preliminary tests were 
given and later examined introspectively by a small number of sub- 
jects to determine whether or not people who actually knew the mean- 
ings were able to recognize what was called for in each case. 

The odd-numbered words were then put into one form of the test 
and the even-numbered ones into another. About twenty tests of each 
were administered to psychology students at Northwestern and Baylor 


i 
= 


70 The Journal of Educational Psychology 


universities. These revealed a few more errors in cue meanings, but, 
particularly, that approximately fourteen per cent of the meanings 
were almost universally known and that about forty-one per cent 
were almost universally not known. 

Of those not known sixty-one per cent were meanings technical to 
some field of specialization. With this evidence at hand a final form 
was made, using both of the preliminary forms, and omitting meanings 
unknown to more than eighty per cent (usually ninety per cent) of 
the subjects, as well as those known to eighty per cent, except for com- 
monest meanings universally known, which were starred. The star 
indicated that they need not be answered, since only discriminative 
items were needed to differentiate subjects. The starred meanings 
were included to orient the subject to the commonest meaning and to 
provide a check at the lower levels of ability. Eighty-two of these 
tests and the English Recognition Vocabulary Tests of Seashore and 
Eckerson were given to subjects at Northwestern and Baylor universi- 
ties. All scoring has been done by the experimenter and one trained 
assistant, using as a guide dictionary meanings for all suggested 
meanings of the test. A copy of this guide and both the English 
Recognition Vocabulary Test and the Multiple-Meanings Test are on 
file at the Charles Deering Library, Northwestern University, Evans- 
ton, Illinois.‘ After the words had been arranged in order of difficulty, 
an odd-even reliability of r = .85 + .03 was obtained for the test after 
application of the Spearman-Brown prophecy formula. * 


RESULTS 


(1) The per cent of the common basic list of words in the Seashore- 
Eckerson list having multiple meanings according to the criteria set up 
for this experiment was found to be forty-three per cent. 

(2) The range of multiple meanings was (a) in the dictionary, from 
two to forty-one numbered meanings per word; and (6) in the multiple 
meanings test, from two to sixteen meanings per word. 

(3) The average number of multiple meanings known by students 
was one hundred forty-five (SD = 23—) out of a possible two hundred 
twenty-two on the multiple-meanings test. There were three hundred 
eighty-five meanings in the dictionary for the same list of words 
according to the criteria of this test. 


* Odd-even corrected reliability for English Recognition Vocabulary test 
= .84 + .02. 


i 
i 


Interrelations of Vocabulary Skills 71 


(4) The average number of meanings known per word by college 
students was found to be 2.60 (the average number of meanings per 
word, of those having multiple meanings, in the dictionary was 5.28 
and in the test, 3.96). 

(5) Correlations* for thirty-eight subjects at Northwestern and 
for forty-four subjects at Baylor, respectively, between multiple mean- 
ings and commones} meanings known as measured by this experiment 
was found to be r = .50 + .08 and r = .59 + .06. From the com- 
bined data a correlation of r = .68 + .03 was obtained, the increased 
r being due partly to the difference in means of the two groups and 
partly to the increased range. 

(6) The correlation determined by Seashore?* between commonest 
meanings known and group intelligence test scores was found to range 
from .50 to .60 for a large number of cases. 

(7) Correlations were obtained between knowledge of multiple 
meanings and intelligence as measured by three different forms of the 
Ohio State Mental Alertness test at Northwestern, and by three forms 
of the A.C.E. Psychological Examination at Baylor. These r’s were 
45 + .08 and .43 + .08, respectively, in spite of the fact that some 
accuracy was lost in equating scores for the different forms of the tests. 

(8) Other correlations obtained only at Northwestern which further 
indicate that knowledge of multiple meanings and single commonest 
meanings are closely related, are:t 


Multiple-Meanings Test and Iowa Reading Comprehension 
N = 37,r = .51 + .08 
English Recognition Vocabulary Test and Iowa Reading Comprehension 


N = 120, r = .61 + .04 
Multiple-Meanings Test and Iowa English Test (rhetoric) 
N = 26,r = .50 + .10 
English Recognition Vocabulary Test and Iowa English Test 
N = 50, r = .61 + .06 


* Pearson product-moment. 

t The significance of the moderately high correlation between vocabulary skills 
found in this study is corroborated by the finding of much higher intercorrelations 
between different types of vocabulary tests employed in studies by Capps (1) of 
deterioration of epileptic cases. 


72 The Journal of Educational Psychology 


Intercorrelations between the Iowa Tests and Mental Alertness 
tests for one hundred forty-one subjects at Northwestern range 
between .58 + .035 and .78 + .02. 


CONCLUSIONS 


In view of (1) the reliabilities of the two vocabulary tests, (2) the 
use of different types of criteria for the tests, (3) the presupposition of 
an elementary knowledge of parts of speech by subjects, and (4) the 
agreement of the Multiple-Meanings Test and other verbal tests with 
the usual correlation between verbal abilities (around .50), apparently 
intensity, or richness of vocabulary (multiple meanings), is fairly 
closely related to extensity, or knowledge of single commonest mean- 
ings, so that for general purposes we may estimate one from the other. 

In problems where further analysis is desired, knowledge of multi- 
ple meanings may be measured either by the present test employing 
the criterion of use in an illustrative sentence or by constructing a 
multiple-choice form of the test. 


BIBLIOGRAPHY 


1. Capps, H. C.: “Vocabulary changes in mental deterioration.” Archives of 
Psychol., No. 242, Sept. 1939. 

2. Eckerson, Lois D.: The estimation of individual differences in the total size of 
general English recognition vocabulary. Unpublished master’s thesis, Univer- 
sity of Southern California, 1938. 

3. Funk and Wagnalls: New Standard Dictionary of the English Language. N. Y.: 
Funk & Wagnalls Co., 1937. 

4. Lovell, G. D.: Interrelations of vocabulary skills: commonest versus multiple 
meanings. Master’s thesis, Northwestern University, 1939. 

5. Mawson, C. O. Sylvester: Roget’s thesaurus of the English language in dictionary 
form. Garden City, N. Y.: Garden City Publishing Co., 1936. 

6. Seashore, R. H., and Eckerson, Lois D.: ‘The measurement of individual differ- 
ences in general English vocabularies.” J. of Educ. Psychol., 31, 1940, pp. 
14-38. 

7. Terman, L.: The measurement of intelligence. Boston: Houghton Mifflin Co., 
1919. 

8. Webster’s New International Dictionary of the English Language (2nd ed.). Wm. 
A. Neilson, ed.-in-chief. Springfield Mass.: G. & C. Merriam Co., 1938. 


TWIN, SIBLING AND CHANCE IQ DIFFERENCES* 


JAMES D. PAGE 
The University of Rochester 


In nature-nurture studies of intelligence it is frequently desirable 
to determine the distribution of expected intra-pair IQ differences for 
twin, sibling and unrelated chance pairs. For example, how often 
may we expect IQ differences of five or twenty IQ points in identical 
twin pairs and what are the chances that two individuals selected at 
random will differ by five, ten, fifteen or fifty IQ points? These data 
may be obtained experimentally by actually testing hundreds of twin, 
sibling and unrelated pairs. However, the labor involved is great 
and the results are often invalidated by sampling errors. There“is the 
further objection that data so obtained are limited to the specific test 
used in the survey. 

The problem may be simplified by the use of the general formula 
for the standard deviation of a difference.' With reference to the 
present problem, this reduces to:T 


oi.d. = 1.4146 V1 — 17; 


where ‘‘o i.d.”’ is the standard deviation of the intra-pair differences or 
the standard deviation of the distribution of differences, ‘‘c’’ is the 
““a”’ of the test and consequently the ‘‘c’’ of the scores in each of the 
paired groups and “‘r”’ is the correlation coefficient between the pairs. 

When ‘‘ci.d.” has been determined from ‘“‘r’”’ and “‘c,” the prob- 
ability that twin, sibling or chance pairs will differ by some particular 
range of IQ points may be ascertained by finding the proportion of the 
total area of the distribution curve of differences which is between 
the limits of that range. This is the familiar method of finding the 
area between score points within a normal distribution curve. 

Since the mean deviation of the distribution of differences is 
.7979ci.d., and the mean deviation of the distribution of differences 


*The writer wishes to express his appreciation to Drs. Jack Dunlap and 
William Kappauf for their helpful suggestions and assistance. 


t When “A” and “B” are scores expressed as deviations from their means 
2 2 
and “d” is the difference between “A” and “B,” 24B) 


This reduces to og? = o4? + op* — 2ragcp. Since o4 = op the formula becomes 
= 1.4140 4/1 — r. 


73 


74 The Journal of Educational Psychology 


gives the mean intra-pair difference directly; the latter may be easily 
obtained by the formula: 


Mean intra-pair difference = 1.lo 1/1 —r 


Data obtained by the application of the ‘“‘ci.d”’ formula to the 1937 
Stanford-Revision—with correlation coefficients for identical twins 
taken as .90, siblings .50 and unrelated pairs .00—are presented in 
Table I. A difference of iess than five IQ points may be expected 


TaBLE I.—THEORETICAL DISTRIBUTION OF INTRA-PAIR IQ DIFFERENCES ON THE 
1937 


Distribution in per cents 


Differences in IQ Identical Siblin Chance 
twins ail pairs 


4 
1 


ond — 

— 

HAS 


OO 
w 
4) 


7 
11.6 16.3 


or or 


in forty-nine per cent of identical twins, twenty-three per cent of sibling 
pairs, and seventeen per cent of unrelated pairs. Half of the sibling 
pairs will differ by less than twelve and half of the chance pairs by 
less than seventeen IQ points. About one per cent of identical twin 
pairs will differ by more than twenty IQ points. Intra-pair differences 
of forty-five or more points may be expected in about one per cent of 
siblings and six per cent of unrelated pairs selected at random. 


| 
| 


Twin, Sibling, and Chance IQ Differences 75 
On first impression the mean intra-pair difference of eighteen and 
seven-tenths IQ points for unrelated individuals may appear somewhat 
small. It is necessary to bear in mind that on the Revised Stanford- 
Binet Scale about one-fourth of the unselected population have an IQ 
between 95-105 and two-thirds an IQ between 85-115. Only about 
one in one thousand children have intelligence quotients of under 50 or 
over 150. This congentration of IQ scores about 100 limits the pos- 
sibility of extreme differences in intelligence scores for random pairs.‘ 


TABLE I].—ComPARISON OF THEORETICAL AND Emprricat INTRA-PAIR IQ Dirrer- 
ENCES ON THE 1916 STanrorD-Binet ScALeE 


Distribution in per cents 
Didierences in 1Q Identical twins Siblings Chance pairs 

Theo- | Em- Theo- Em- Theo- Em- 

retical | pirical | retical | pirical | retical | pirical 
nde 55 55 26 28 19 22 
32 29 24 25 17 20 
11 11 19 18 16 15 
2 3 13 13 13 17 
—1 1 9 6 11 9 
3 2 6 5 
cc's 1 1 4 2 
1 —1 3 2 
Mean difference......... 5.1 5.5 11.6 11.3 16.4 | 14.9 
Median difference........ 4.6 4.6 10.1 9.4 14.3 12.7 


On the 1916 Stanford-Binet Scale some empirical intra-pair data 
are available. These experimentally determined differences are com- 
pared with the expected theoretical differences in Table II. In obtain- 
ing the latter figures the o of the 1916 Scale was taken as 15. The 
empirical identical twin cases included the fifty pairs studied by New- 
man, Freeman and Holzinger? and the ninety summarized from various 
sources by Schwesinger.* All these twin pairs were presumedly reared 
together. Through the courtesy of Miss A. Leila Martin, five hundred 
sibling pairs were obtained from the test files of the Child Study 


76 The Journal of Educational Psychology 


Department of the Rochester Board of Education. Separated, and 
rearranged, these sibling pairs also constituted five hundred of the 
chance pairs. To these were added the one hundred sibling and one 
hundred unrelated pairs reported by Schwesinger*® and the forty- 
seven sibling pairs of Newman, Freeman and Holzinger.? 

The experimentally obtained identical twin and sibling intra-pair 
differences agree very closely with the statistically predicted differences. 
The mean and median intra-pair differences obtained by the two 
methods are practically the same. The empirical intra-pair differ- 
ences for the chance pairs are somewhat smaller than the predicted 
values. This is undoubtedly the result of a sampling error. The 
five hundred cases selected from the test files of the Child Study 
Department were not a representative sample of the general popula- 
tion. They were, for the most part, retarded children who had been 
tested primarily because of their poor school work. About two-thirds 
had IQ scores between 80 and 100. Their IQ range as measured in 
terms of o was 13.4 as compared with the generally accepted ¢ of 15 


for the general population. 
SUMMARY 


A technique is described for determining, by means of a formula, 
the distribution of intra-pair IQ differences for twin, sibling and ran- 
dom pairs when the correlation coefficient for the pairs and the sigma 
of the test are known. On the 1916 Stanford-Binet, data obtained by 
this formula agree completely with empirically obtained data. 

The predicted mean intra-pair differences on the 1937 Stanford- 
Binet are five and eight-tenths IQ points for identical twins, thirteen 
and two-tenths for sibling pairs and eighteen and seven-tenths for 
unrelated pairs. Intra-pair differences of less than ten IQ points may 
be expected in eighty-one per cent of identical twins, forty-five per 
cent of siblings and thirty-three per cent of unrelated pairs. 


BIBLIOGRAPHY 


(1) Garrett, H. E.: Statistics in Psychology and Education. New York: Longmans, 


Green and Co., 1938, p. 193. 

(2) Newman, H. H., Freeman, F. N. and Hoizinger, K. J.: Twins. A Study of 
Heredity and Environment. Chicago: University of Chicago Press, 1937, 
p. 369. 

(3) Schwesinger, G. C.: Heredity and Environment. New York: Macmillan Co., 
1933, p. 269. 

(4) Terman, L. M. and Merrill, M. A.: Measuring Intelligence. Boston: Houghton 
Mifflin Co., 1937, p. 461. 


\ 
it 


BOOK REVIEWS 


Goprrey H. Toomson. An Analysis of Performance Test Scores of a 
Representative Group of Scottish Children. London: University of 
London Press, 1940, pp. 58. 


The stated purposes of this study are: (1) To submit to analysis the 
scores collected in a previously reported experiment on ‘‘The Intelli- 
gence of a Representative Group of Scottish Children” in which a 
Binet test and eight performance tests were employed, and (2) to 
give tables of data which will permit further statistical computations 
by those interested. Performance tests used were Seguin Form Board, 
Manikin Test, Stutsman Picture Test, Red Riding Hood Test, Healy 
Picture Completion Test, Knox Cube Imitation Test, Cube Construc- 
tion Test and Kohs Block Design. 

Although several of the distributions were found to depart from 
normality, the computations were made after admitting a certain 
extent of invalidation. Data on distributions, means, sums of squares, 
standard deviations, full (zero order) correlations, partial correlations, 
multiple correlations, analysis of variance, and factorial analysis are 
given. Of the single performance tests, the Cube Construction Test 
predicted Binet IQ best (r = .52 for boys and .58 for girls). Multiple 
correlation, using all eight tests, raised this prediction to .69 and.73, 
respectively. The greater part of the rise was achieved, however, by 
adding only Kohs and Healy tests. There tended to be no significant 
difference between results for boys and for girls. Formulas were 
devised for predicting Binet IQ from these three performance tests, 
but the results are not highly accurate. 

The factor analysis identified two factors with a fair degree of 
certainty: (1) A general factor which may possibly but not necessarily 
be identified with g, and (2) a speed factor linking the Seguin, Manikin 
and Stutsman tests. 

The statistical analyses appear adequate. Nevertheless, as hap- 
pens occasionally in statistical studies, the psychological implications 
receive slight and inadequate attention. Miues A. TINKER. 

University of Minnesota. 


SHELDON GLUECK AND ELEANOR GLUECK. Juvenile Delinquents Grown 
Up. New York: The Commonwealth Fund, 1940, pp. 330. 


Juvenile delinquents do grow up. Some grow to be criminals. 
Many outgrow their earlier delinquencies and criminal tendencies. 
77 


78 The Journal of Educational Psychology 


The personality and social characteristics of reformed and unreformed 
individuals differ. The key difference is in the level of maturity the 
behavior characteristics signify. The primary reason for the improved 
conduct of offenders with passing years is the degree of maturity 
attained regardless of age. The differential characteristics are such 
that the behavior of an individual delinquent can be predicted. The 
foregoing findings and generalizations indicate the nature of the data 
reported and interpretations made by the Gluecks in their latest con- 
tribution, Juvenile Delinquents Grown Up. 

The researches reported in this volume represent another advance 
in the significant follow-up studies of delinquent careers. In their 
previous researches, the present investigators have displayed a special 
capacity to search, find and interview delinquents years after arrest. 
In One Thousand Juvenile Delinquents they reported a study of cases 
referred to the Boston Juvenile Court during the years 1917-1922. 
The general purpose of this present volume is to investigate what has 
happened to these delinquents during this period of ten years, fifteen 
years since their first offense. Specific problems considered include a 
comparison of the characteristics of serious and minor offenders, of 
reformed and unreformed offenders, of younger and older offenders, of 
successes and failures during intramural or extramural treatment. 
Also included is a fairly comprehensive and detailed discussion of the 
possibilities for using the findings for the purpose of predicting behavior 
during various forms of peno-correctional treatment. 

Juvenile court judges are advised that the tables offered in the 
book will make it possible for them to determine the likely behavior 
of different types of offenders who appear before them and also the 
ages at which they can look for conduct changes. That judges and 
probation officers, as well as mental hygienists, can profit from a careful 
reading of this book there is no doubt. That it is advisable for them 
to make liberal use of the tables furnished as a means of individualizing 
justice is questionable without further clinical validation of the results. 

H. MELrTzer. 


Psychological Service Center, St. Louis, Missouri. 


Epwarp Sweetman. The Educational Activities in Victoria of the 
Rt. Hon. H. C. E. Childers. Melbourne: Melbourne University 


Press, 1940, pp. 140. 


The Port Phillip District of New South Wales was within a year of 
attaining colony status under the name Victoria Colony when a young 


ay 
i 
ae 


Book Reviews 79 


Cambridge graduate and his bride arrived at Melbourne in 1850. 
Childers had a letter of introduction to the Superintendent of the Dis- 
trict, but no definite prospects of a position. Within three months he 
was appointed Inspector under the Denominational School Board. 
This was the first of a series of positions of an official sort which 
enabled him to initiate many educational projects during the six years 
he was in the Colony. His first report on the Denominational Schools 
was of great importance and laid the foundation for subsequent action 
in the establishment of National Schools. To Childers goes a great 
share of the responsibility for the organization of Victoria’s educational 
policy, for the founding of the University of Melbourne, and for the 
founding of the Melbourne Public Library. 

The present volume is a factual and interesting account of Childers’ 
activities during his stay in Australia. Several of his important official 
papers are reproduced in full. These read with a very modern tone, 
and indicate the farsightedness of this early educator. 

C. M. Lovurrtir. 


Indiana University. 


Roma Gans. Critical Reading Comprehension in the Intermediate 
Grades. New York: Bureau of Publications, Teachers College, 
Columbia University, 1940, pp. 135. 


In the intermediate grades a widespread trend in the curriculum 
has been toward increased use of reference reading which implies the 
need of critical and selective powers on the part of the pupils. This 
emphasis stems from the movement toward an activity or unit cur- 
riculum. An analysis of the problems raised by this trend led to the 
following hypothesis which was tested experimentally: ‘The critical 
type of reading required in the selection-rejection of content for use in 
solving a problem has many of the same elements as the more common 
type of reading usually measured by reading comprehension tests but 
differs from it in certain important respects.” 

A criterion of reading comprehension was established from eight 
reading tests whose scores had a reliability of .96. To measure ability 
in reference reading a “‘ Test of Reading Selection-Rejection”’ was con- 
structed. In the five parts of this test the materials were classified as 
directly relevant, remotely relevant, fanciful, encyclopedic and sheerly 
irrelevant. Subjects were to respond in terms of (1) understanding 
the problem of project, and (2) accepting material in items as helpful 


42> 


80 The Journal of Educational Psychology 


or rejecting it as not helpful. Readability (comprehensibility) of the 
test was adequate for the pupils concerned. Reliabilities ranged from 
.67 to .90. Intercorrelation of the five subtests seemed “to indicate a 
pattern of selecting or rejecting rather than a process of critical 
evaluating and weighing before selecting or rejecting” the items. 
Correlations of the selection-rejection test (parts) with the criterion of 
comprehension were positive but low (.35 to .57) except for the 
remotely relevant material, where it was —.11. Thus the selection- 
rejection scores are not intimately related to reading comprehension. 
This is in line with the hypothesis. 

Factor analysis revealed that reference reading is a composite 
ability. The two main factors are reading ability (as measured on 
standardized tests) and a “selection-rejection pattern.”’ 

Even for readers with good comprehension it was very difficult to 
select correctly the remotely relevant paragraphs and to discriminate 
the fanciful material. This suggests a marked deficiency in the teach- 
ing of critical reading. A final chapter gives the implications of the 
study as to methods of teaching reading and as to reading materials. 

This type of study is distinctly of the kind that should be encour- 
aged in the field of reading. Many will agree that the “‘traditional”’ 
tests of reading ability, while contributing much, are far from adequate 
for giving a complete picture of the reading skills required in con- 
temporary school programs. It is now timely to devise techniques for 
measuring proficiency in these skills and to organize teaching to 
overcome deficiencies where they are discovered. While the present 
investigation may be incomplete in some respects, it is a movement 
in the right direction. Mites A. TINKER. 

‘University of Minnesota. 


Errata 


In the article, ““The Problem of Learning Readiness,” by J. A. 
Lynch in September, 1940, issue on page 440 the expression stated 
as an equation should read “a function of,” rather than, “an integral 


of.” Ed. 


